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Abstract. Recent empirical research indicates that many convex optimization problems with random constraints 
exhibit a phase transition as the number of constraints increases. For example, this phenomenon emerges in the (\ 
minimization method for identifying a sparse vector from random linear samples. Indeed, this approach succeeds 
with high probability when the number of samples exceeds a threshold that depends on the sparsity level; otherwise, 
it fails with high probability. 

This paper provides the first rigorous analysis that explains why phase transitions are ubiquitous in random 
convex optimization problems. It also describes tools for making reliable predictions about the quantitative aspects 
of the transition, including the location and the width of the transition region. These techniques apply to regularized 
linear inverse problems with random measurements, to demixing problems under a random incoherence model, 
and also to cone programs with random affine constraints. 

These applications depend on foundational research in conic geometry. This paper introduces a new summary 
parameter, called the statistical dimension, that canonically extends the dimension of a linear subspace to the class 
of convex cones. The main technical result demonstrates that the sequence of conic intrinsic volumes of a convex 
cone concentrates sharply near the statistical dimension. This fact leads to an approximate version of the conic 
kinematic formula that gives bounds on the probability that a randomly oriented cone shares a ray with a fixed cone. 



1. Motivation and contributions 

A phase transition is a sharp change in the character of a computational problem as its parameters vary. 
Recent work indicates that phase transitions emerge in a variety of random convex optimization problems 
from mathematical signal processing and computational statistics; for example, see [DT09b, Sto09, RFP10, 
CSPW11, MT12, DGM13]. This paper develops geometric tools that allow us to identify the location of these 
phase transitions using geometric invariants associated with the mathematical program. Our analysis provides 
the first fully rigorous account of transition phenomena in random linear inverse problems, random demixing 
problems, and random cone programs. 

1.1. Vignette: Compressed sensing. To illustrate our goals, we discuss the compressed sensing problem, a 
familiar example where a phase transition is plainly visible in numerical experiments [DT09b] . Let xq e U d be 
an unknown vector with 5 nonzero entries. Suppose we have access to a vector zo e R m consisting of random 
linear samples of x , where the number m of samples is smaller than the ambient dimension d. More precisely, 
assume that zq - Axq where A is an m* d matrix with independent standard normal entries. The aim is to 
reconstruct xq given the measurement vector zq, the measurement matrix A, and the prior knowledge that xq 
is sparse. One may regard this formulation as a toy model for understanding when it is possible to solve an 
underdetermined linear inverse problem with a structured unknown. 

A well-established approach [CDS01, CT06, Don06a] to the compressed sensing problem is the method of 
(\ minimization: 

minimize ||jc|h subject to zq-Ax. (1.1) 

We say that the convex program (1.1) succeeds at solving the compressed sensing problem when it has a 
unique optimal point x that equals the true unknown x$; otherwise, it fails. 

Figure 1 . 1 depicts the results of a computer experiment designed to estimate the empirical probability 
that (1.1) succeeds (with respect to the randomness in A) as the sparsity level s and the number m of samples 
range from zero to the ambient dimension d. The plot evinces that, for a given sparsity level s, the £\ 
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Compressed sensing with £\ minimization 
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Figure 1.1: Empirical phase transition in compressed sensing. The colormap indicates the empirical 
probability that the £\ minimization problem (1.1) successfully recovers a sparse vector xq e R 100 from the vector 
zo = Axq of random linear measurements, where A is a standard normal matrix. The probability of success 
increases with brightness from certain failure (black) to certain success (white). 

minimization technique (1.1) almost always succeeds when we have an adequate number m of samples, 
while it almost always fails when we have fewer samples. See Appendix A for the experimental details. 

Figure 1.1 raises several interesting questions about the performance of the (\ minimization method for 
solving the compressed sensing problem: 

• What is the probability of success? For a given pair (s, m) of parameters, can we estimate the 
probability that (1.1) succeeds or fails? 

• Does a phase transition exist? Is there a simple curve m - y/{s) that separates the parameter space 
into regions where (1.1) succeeds or fails most of the time? 

• Where is the edge of the phase transition? Can we find a formula for the location of this threshold 
between success and failure? 

• How wide is the transition region? For a given sparsity level s and ambient dimension d, how big 
is the range of m where the probability of success and failure are comparable? 

• Why does the transition exist? Is there a geometric explanation for the phase transition in com- 
pressed sensing? Can we export this reasoning to understand other problems? 

In Section 10, we summarize a large corpus of research that has attempted to address these questions. 
Unfortunately, the current results are fragmentary, even for the vanilla compressed sensing problem. This 
work provides a detailed answer to each of the questions we have posed. 

1.2. Contributions. We approach phase transition phenomena by studying the intrinsic geometric properties 
of convex cones. This attack is appropriate for the compressed sensing problem because we can express the 
optimality condition for (1.1) in terms of a descent cone of the £\ norm. Our techniques apply more broadly 
because convex cones play a central role in convex optimization. Let us summarize the main contributions of 
this work. 

• We introduce a new summary parameter for convex cones, which we call the statistical dimension. 
This quantity canonically extends the linear dimension of a subspace to the class of convex cones. 
(See Definition 2.1, Proposition 4.1, Proposition 5.11, and Proposition 5.12.) 
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• We prove that a regularized linear inverse problem with random measurements, such as the com- 
pressed sensing problem, must exhibit a phase transition as the number of random measurements 
increases. The location and width of the transition are controlled by the statistical dimension of a cone 
associated with the regularizer and the unknown. (Theorem II, Theorem 7.1, and Proposition 9.1.) 

• The literature [MT12] describes convex programming methods for decomposing a superposition of 
two structured, randomly oriented vectors into its constituents. These methods exhibit a phase transi- 
tion whose properties depend on the total statistical dimension of two convex cones. (Theorem III 
and Theorem 7.1.) 

• A cone program with random affine constraints displays a phase transition as the number of constraints 
increases. We can predict the transition using the statistical dimension of the cone. (Theorem 8.1.) 

• In Section 4, we explain how to compute the statistical dimension for several important families of 
cones that arise in convex optimization and mathematical signal processing. These calculations are 
not substantially novel, but we provide the first rigorous proof that they are sharp in high dimensions. 
(Propositions 4.2, 4.3, 4.7, 4.8, and 4.9 and Theorem 4.5.) 

These applied results are supported by foundational research in conic geometry. Our theoretical analysis 
identifies a new geometric phenomenon that governs all the phase transitions mentioned above. 

• We prove that the sequence of conic intrinsic volumes of a general convex cone concentrates at the 
statistical dimension of the cone. This result provides a new family of nontrivial inequalities among 
the conic intrinsic volumes. (Theorem 6.1 and Section 6.1.) 

• The concentration result implies an approximate version of the kinematic formula for cones. This 
result uses the statistical dimension to bound the probability that a randomly oriented cone shares a 
ray with a fixed cone. (Theorem I and Theorem 7.1.) 

As an added bonus, we provide the final ingredient needed to resolve a series of conjectures [DMM09a, 
DJM11, DGM13] about the coincidence between the minimax risk of denoising and the location of phase 
transitions in linear inverse problems. Indeed, Oymak & Hassibi [OH12] have recently shown that the 
minimax risk is equivalent with the statistical dimension of an appropriate cone, and our results prove that 
the phase transition occurs at precisely this spot. See Section 10.4 for further details. 

1.3. Roadmap. Section 2 explains the connection between conic geometry and phase transitions in two 
applications. We analyze both problems, and we showcase computer experiments that confirm the accuracy 
of our predictions. Section 3 introduces notation and background material. In Section 4, we explain how to 
compute the statistical dimension for several important families of cones, and we prove that the calculations 
are sharp. Section 5 summarizes the central concepts from conic geometric probability, including the definition 
of conic intrinsic volumes. Sections 6 and 7 form the technical kernel of the paper. Here, we establish that the 
sequence of conic intrinsic volumes concentrates at the statistical dimension, and we derive an approximate 
kinematic formula as a consequence. Afterward, we return to applications. Section 8 shows that cone 
programs with random affine constraints exhibit a phase transition. Section 9 uses our theory to prove that a 
certain linear inverse problem is bound to fail. Finally, we canvass the related work in Section 10. 

The paper contains five substantial appendices, which house many of the technical details. Appendix A 
provides information about the computer experiments. Appendix B justifies the methods for computing the 
statistical dimension of a descent cone. Appendix C completes several statistical dimension calculations. 
Appendix D contains most of the analysis required to show that the sequence of conic intrinsic volumes 
concentrates. Finally, Appendix E establishes a connection between the statistical dimension and another 
geometric quantity called the Gaussian width. 

2. Conic geometry and phase transitions 

The existence of phase transitions depends on a striking new fact about the geometry of cones. Indeed, our 
entire approach can be summarized in the following precept: 

Each convex cone C admits a "dimension" parameter 8(C). For certain purposes, the cone behaves 
like a linear subspace with dimension [5(C)]. 

This section introduces the quantity 5(C), which we call the statistical dimension of the cone. Then we 
state an approximate kinematic formula, which uses the statistical dimension to express the probability that 
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two randomly oriented cones intersect nontrivially. This theorem parallels familiar statements about the 
intersection of two randomly oriented subspaces. Afterward, we apply the approximate kinematic formula 
to explain why phase transitions arise in random instances of regularized linear inverse problems (such as 
compressed sensing) and in a related class of demixing problems. 

2.1. The statistical dimension of a convex cone. For certain purposes, the dimension of a linear subspace 
carries all pertinent information about the subspace. In particular, we can resolve questions in stochastic 
geometry, such as the probability that two random subspaces have a nontrivial intersection, as soon as we 
know the dimensions of the subspaces. 

Consider the more general problem of determining the probability that a randomly oriented subspace 
shares a ray with a fixed convex cone. Prima facie, it seems impossible that we could answer this question 
without detailed information about the cone. Nonetheless, there is a single parameter that encapsulates all 
the relevant geometry of the cone. 

Definition 2.1 (Statistical dimension). The statistical dimension 8(C) of a closed convex cone C c R d is defined 
as 

S(C):=E[||n c (g)l| 2 ], (2.1) 
where g e U d is a standard normal vector, ||-|| is the Euclidean norm, and lie denotes the Euclidean projection 
onto the cone C: 

n c (x) :=argmin{||x-y|| :yeC}. 
We define the statistical dimension of a general convex cone to be the statistical dimension of its closure. 

As we show in Section 5.3, the statistical dimension canonically extends the linear dimension of a subspace 
to the class of convex cones. Let us highlight a few properties that support this claim. First, the statistical 
dimension of a subspace L c U d satisfies 

<5(L) = E[||n I (g)|| 2 ] =dim(L). (2.2) 

Second, the statistical dimension of a cone C c U d is rotationally invariant: 

8(UC) = 8{C) for each orthogonal matrix Ue U dxd . (2.3) 

Third, the statistical dimension increases with the size of the cone in the sense that C c K implies that 
8(C) < 8(K). See Proposition 4.1 for details about these and other properties of the statistical dimension. 

Section 4 explains how to calculate the statistical dimension for several important families of convex cones. 
The examples include self-dual cones, circular cones, descent cones of the £\ norm, and descent cones of the 
Schatten 1-norm. 1 In all of these cases, we prove rigorously that our estimates for the statistical dimension are 
sharp in high dimensions. 

Remark 2.2 (Related concepts). The statistical dimension of a cone is closely related to its Gaussian width, 
another summary parameter for cones that has been proposed in the literature [RV08, Sto09, CRPW12]. 
See Section 10.3 for further discussion of Gaussian widths. Recent work of Oymak & Hassibi [OH12] also 
illuminates a connection between the statistical dimension and the minimax risk for denoising problems; see 
Section 10.4. 

2.2. Kinematics and statistical dimension. Conic integral geometry [SW08, Ch. 6] studies properties of 
convex cones that are invariant under rotation and reflection. A crowning achievement in this field is the 
conic kinematic formula [SW08, Thm. 6.5.6], which yields the exact probability that two randomly oriented 
convex cones intersect nontrivially. As recognized in [Amell, MT12], this result is tailor-made for studying 
random instances of convex optimization problems. Our main applied result is an approximate kinematic 
formula, expressed in terms of the statistical dimension. 

Theorem I (Approximate kinematic formula). Fix a tolerance n£ (0,1). Suppose that C,K<zU d are closed 
convex cones, one of which is not a subspace. Draw an orthogonal matrix Q e u dxd uniformly at random. Then 

diQ + SOO^d-a^y/d => P{Cn QK= {0}} > l-m 

8(C) + 8(K)>d + a^Vd =^> P>{Cn QK = {0}} < n. 

The quantity := 4^/log(4/^). For example, ao.oi < 10 and ao.ooi < 12. 



The Schatten 1-norm equals the sum of the singular values of a matrix. 
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Theorem I allows us to control the probability that two randomly oriented cones strike as soon as we know 
their statistical dimensions. Roughly, the cones are likely to share a ray if and only the total dimension of the 
two cones exceeds the ambient dimension. This claim is in perfect sympathy with the analogous statement 
about subspaces. In the next few sections, we explore applications of this pluripotent result. 

Section 7 contains the proof of Theorem I, which has two main ingredients. The first piece is the conic 
kinematic formula, which expresses the exact intersection probability in terms of geometric invariants called 
conic intrinsic volumes [SW08, Sec. 6.5]. Each cone in R d has d+ 1 conic intrinsic volumes, much as a convex 
set in R d has a volume, a surface area, a mean width, and so forth [KR97]. The second component of the proof 
is a new property of conic intrinsic volumes (Theorem 6.1). This result shows that most intrinsic volumes 
of a cone C are negligible in size, except for the ones whose index is close to the statistical dimension 8{C). 
Combining these facts, we can bound the intersection probability accurately via the statistical dimension. 

2.3. Regularized linear inverse problems. Our first application of Theorem I concerns a generalization of 
the compressed sensing problem. A linear inverse problem asks us to infer an unknown vector xq e R d from an 
observed vector zq e U m of the form 

zo = Axo, (2.4) 
where A e u mxd is a matrix that describes a linear data acquisition process. When the matrix is fat {m < d), 
the inverse problem is underdetermined. As a consequence, we cannot hope to identify xq unless we exploit 
some prior information about its structure. 

To solve the underdetermined linear inverse problem, we consider an approach based on optimization. 
Suppose that / : U d — R is a proper convex function 2 that reflects the amount of "structure" in a vector. We 
can attempt to identify the structured unknown x in (2.4) by solving a convex optimization problem: 

minimize /(x) subject to zq = Ax. (2.5) 

The function / is called a regularizer, and the formulation (2.5) is called a regularized linear inverse problem. 
To illustrate the kinds of regularizers that arise in practice, we highlight two familiar examples. 

Example 2.3 (Sparse vectors). When the vector xq is known to be sparse, we can minimize the £\ norm to 
look for sparse solutions of the inverse problem. Repeating (1.1), we have the optimization 

minimize subject to zq-Ax. (2.6) 

This approach was proposed by Chen et al. [CDS01], motivated by work in geophysics [CM73, SS86]. 

Example 2.4 (Low-rank matrices). Suppose that Xq is a low-rank matrix, and we have acquired a vector of 
measurements of the form zq = (Xo) where si is a linear operator. This process is equivalent with (2.4). We 
can look for low-rank solutions to the linear inverse problem by minimizing the Schatten 1-norm: 

minimize l|X|| Sl subject to zo = g/(X). (2.7) 

This idea was proposed by Recht et al. [RFP10], motivated by work in control theory [MP97, Faz02]. 

The paper [CRPW12] presents a general framework for constructing a regularizer / that promotes a specified 
type of structure, as well as many additional examples. 

We say that the regularized inverse problem (2.5) succeeds at solving (2.4) when the convex program has a 
unique minimizer x that coincides with the true unknown; that is, x = xq. To develop conditions for success, 
we introduce a convex cone associated with the regularizer / and the unknown xq. 

Definition 2.5 (Descent cone). The descent cone 2>{f,x ) of a function / : R d -> U at a point x e U d is the 
conic hull of the perturbations that do not increase / near xq. 

mf,x):={J{yen d :f(x+Ty)<f(x)}. 

T>0 

Chandrasekaran et al. [CRPW12, Prop. 2.1] characterize when the optimization problem (2.5) succeeds in 
terms of a descent cone. This result is simply a geometric statement of the primal optimality condition. 

Fact 2.6 (Optimality condition for linear inverse problems). The vector x is the unique optimal point of the 
convex program (2.5) if and only if @ (/, xq) n null(A) = {0}. 



'The extended real numbers R := R u {±oo[ . A proper convex function has at least one finite value, and it does not take the value -oo. 
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Figure 2.1: Geometry of optimality conditions, [left] The optimality condition for the regularized linear 
inverse problem (2.5) states that the descent cone of / at jco is tangent to the null space of A. [right] The 
optimality condition for the convex demixing method (2.9) states that the descent cone of / at xq has a trivial 
intersection with a rotated copy of the descent cone of g at yo. 



Figure 2.1 [left] illustrates the geometry of this optimality condition. Despite its simplicity, this result forges a 
crucial link between the convex optimization problem (2.5) and the theory of conic integral geometry. 

Our goal is to understand the power of convex regularization for solving linear inverse problems, as well as 
the limitations inherent in this approach. To do so, we consider the case where the measurements are generic. 
A natural modeling technique is to draw the measurement matrix A at random from the standard normal 
distribution on U mxd . For this model, Theorem I allows us to identify a sharp transition in the performance of 
the regularized problem (2.5). 

Theorem II (Phase transitions in linear inverse problems) . Fix a tolerance n e (0, 1). Let xq eU d be a fixed 
vector. Suppose A e u mxd has independent standard normal entries, and let zq - Axq. Then 

m > 6[2!{f,xo)) + a^sfd => (2.5) succeeds with probability > 1-77; 

m < 8[Qi{f, xq)) - a v \fd => (2.5) succeeds with probability < n. 

The quantity := 4^/log(4/ry). 

Proof. The standard normal distribution on u mxd is invariant under rotation, so the null space L = nul\(A) 
is almost surely a uniformly random [d - m) -dimensional subspace of U d . According to (2.2), the statistical 
dimension <5(L) = d-m almost surely. The result follows immediately when we combine the optimality 
condition, Fact 2.6, and the kinematic bound, Theorem I. □ 

Theorem II proves that we always encounter a phase transition when we use the regularized formula- 
tion (2.5) to solve the linear inverse problem (2.4) with random measurements. The transition occurs where 
the number of measurements equals the statistical dimension of the descent cone: m = 8{&(f,xo)). The shift 
from failure to success takes place over a range of 0{-/d) measurements. 

In Section 4.5, we estimate the statistical dimension of the descent cone of the (\ norm at a sparse vector 
of fixed dimension. Owing to symmetries of the £\ norm, only the sparsity and the ambient dimension play a 
role in this calculation. When combined with Theorem II, this result yields the exact (asymptotic) location 
of the phase transition for the (\ minimization problem (2.6) with random measurements. In Section 4.6, 
we estimate the statistical dimension of the descent cone of the Schatten 1-norm at a low-rank matrix with 
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Figure 2.2: Phase transitions for linear inverse problems, [left] Recovery of sparse vectors. The empirical 
probability that the £\ minimization problem (2.6) identifies a sparse vector x§ e R 100 given random linear 
measurements zq = Axq. [right] Recovery of low-rank matrices. The empirical probability that the Si 
minimization problem (2.7) identifies a low-rank matrix Xq e k 30x30 given random linear measurements 
zq = si (Xq). In each panel, the colormap indicates the empirical probability of success (black = 0%; white = 
100%). The yellow curve marks the theoretical prediction of the phase transition from Theorem II; the red curve 
traces the empirical phase transition. 



fixed dimensions. This calculation gives the exact (asymptotic) location of the phase transition for the Si 
minimization problem (2.7) with random measurements. 

To underscore these achievements, we have performed some computer experiments to compare the 
theoretical and empirical phase transitions. Figure 2.2 [left] shows the performance of (2.6) for identifying 
a sparse vector in IR 100 ; Figure 2.2 [right] shows the performance of (2.7) for identifying a low-rank matrix 
in K 30x30 . In each case, the colormap indicates the empirical probability of success over the randomness in 
the measurement operator. The empirical 5%, 50%, and 95% success isoclines are determined from the 
data. We also draft the theoretical phase transition curve, promised by Theorem II, where the number m of 
measurements equals the statistical dimension of the appropriate descent cone, which we compute using the 
formulas from Sections 4.5 and 4.6. See Appendix A for the experimental protocol. 

In both examples, the theoretical prediction of Theorem II coincides almost perfectly with the 50% success 
isocline. Furthermore, the phase transition takes place over a range of 0{</d) values of m, as promised. 
Although Theorem II does not explain why the transition region tapers at the bottom-left and top-right 
corners of each plot, we have established a more detailed version of Theorem I that allows us to predict this 
phenomenon as well. See the discussion after Theorem 7.1 for more information. 

2.4. Demixing problems. In a demixing problem [MT12], we observe a superposition of two structured 
vectors, and we aim to extract the two constituents from the mixture. More precisely, suppose that we measure 
a vector zq e U d of the form 

z = x + Uy (2.8) 

where xo,yo E ^ d are unknown and U e u dxd is a known orthogonal matrix. If we wish to identify the pair 
(*o>yo)> we must assume that each component is structured to reduce the number of degrees of freedom. 
In addition, if the two types of structure are coherent (i.e., aligned with each other), it maybe impossible 
to disentangle them, so it is expedient to include the matrix U to model the relative orientation of the two 
constituent signals. 

To solve the demixing problem (2.8), we describe a convex programming technique proposed in [MT12]. 
Suppose that / and g are proper convex functions on R d that promote the structures we expect to find in xq 
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and yo- Then we can frame the convex optimization problem 

minimize f(x) subject to g(y) < g(yo) an d z o- x+Uy. (2.9) 

In other words, we seek structured vectors x and y that are consistent with the observation zq. This approach 
requires the side information g(yo), so a Lagrangian formulation is sometimes more natural in practice. Here 
are two concrete examples of the demixing program (2.9) that are adapted from the literature. 

Example 2.7 (Sparse + sparse). Suppose that the first signal xq is sparse in the standard basis, and the 
second signal Uyo is sparse in a known basis U. In this case, we can use (\ norms to promote sparsity, which 
leads to the optimization 

minimize subject to llylli < llyolli and zq- x+Uy. (2.10) 

This approach for demixing sparse signals is sometimes called morphological component analysis [SDC03, 
SED05, ESQD05, BMS06]. 

Example 2.8 (Low-rank + sparse). Suppose that we observe Zo = Xo + W{Yq) where Xq is a low-rank matrix, 
Yo is a sparse matrix, and ^ is a known orthogonal transformation on the space of matrices. We can minimize 
the Schatten 1-norm to promote low rank, and we can constrain the (\ norm to promote sparsity. The 
optimization becomes 

minimize ||X|| Sl subject to ||K||i < II Yblli and Z = X + ^(F). (2.11) 
This demixing problem is called rank-sparsity decomposition [CSPW11]. 

See the paper [MT12] for some additional examples. 

We say that the convex program (2.9) for demixing succeeds when it has a unique solution 0c, y) that 
coincides with the true unknown: 0c, y) = (xo,yo). As in the case of a linear inverse problem, we can express 
the primal optimality condition in terms of descent cones [MT12, Lem. 2.3]. 

Fact 2.9 (Optimality condition for demixing). The pair (x ,y ) is the unique optimal point of the convex 
program (2.9) if and only if@{f,x ) n {-U@(g,y )) = {0}. 

Figure 2.1 [right] depicts the geometry of this optimality condition. The parallel with Fact 2.6, the optimality 
condition for linear inverse problems, is striking. Indeed, the two conditions coalesce when the function g 
in (2.9) is the indicator of an affine space. 

Our goal is to understand the prospects for solving the demixing problem (2.8) with a convex program of 
the form (2.9). To that end, we use randomness to model the favorable case where the two structures have no 
interaction with each other. More precisely, we draw the matrix U uniformly at random from the orthogonal 
group. Under this assumption, Theorem I delivers a sharp transition in the performance of the optimization 
problem (2.9). 

Theorem III (Phase transitions in demixing). Fix a tolerance n e (0,1). Let xq and yo be fixed vectors in R d . 
Draw an orthogonal matrix U e u dxd uniformly at random, and let zq = xq + Uyo. Then 

8(2i{f,xo)) + <5(@(g,y )) < d- ^sfd ==> (2.9) succeeds with probability > 1 -m 

5(@(/,jco)) +8(@(g,yo)) > d+ a v Vd ==> (2.9) succeeds with probability < n. 

The quantity := 4^/log(4/ry). 

Proof. This theorem follows immediately when we combine the optimality condition, Fact 2.9, with the 
kinematic bound, Theorem I. We invoke the rotational invariance (2.3) of the statistical dimension to simplify 
the formulas. □ 

Theorem III establishes that there is always a phase transition when we use the convex program (2.9) 
to solve the demixing problem (2.8) under the isotropic random model for U. It is accurate to say that the 
optimization is effective if and only if the total statistical dimension of the two descent cones is smaller than 
the ambient dimension d. 

Our numerical work confirms the analysis in Theorem III. Figure 2.3 [left] shows when (2.10) can demix a 
sparse vector from a vector that is sparse in a random basis for IR 100 . Figure 2.3 [right] shows when (2.11) 
can demix a low-rank matrix from a matrix that is sparse in a random basis for IR 35 * 35 . In each case, the 
experiment provides an empirical estimate for the probability of success with respect to the randomness in 
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Demixing sparse + sparse Demixing sparse + low-rank 




Figure 2.3: Phase transitions for demixing problems, [left] Sparse + sparse. The empirical probability 
that the convex program (2.10) successfully demixes a vector xq e R 100 that is sparse in the standard basis from 
a vector Uyo e R 100 that is sparse in the random basis U. [right] Low rank + sparse. The empirical probability 
that the convex program (2.11) successfully demixes a low-rank matrix Xq e r 35x35 from a matrix <?/ (Yq) e r 35x35 
that is sparse in the random basis °2f . In each panel, the colormap indicates the empirical probability of success 
(black = 0%; white = 100%). The yellow curve marks the theoretical phase transition predicted by Theorem III. 
The red curve follows the empirical phase transition. 



the measurement operator. We display the 5%, 50%, and 95% success isoclines, determined from the data. 
We also sketch the theoretical phase transition from Theorem III, which occurs when the total statistical 
dimension of the relevant cones equals the ambient dimension. The statistical dimension of the descent cones 
are obtained from the formulas in Sections 4.5 and 4.6. See [MT12, Sec. 6] for the details of the experimental 
protocol. 

Once again, we see that the theoretical curve of Theorem III coincides almost perfectly with the empirical 
50% success isocline. The width of the transition region is 0{Sd). Theorem III does not predict the tapering 
of the transition in the top-left and bottom-right corners, but the discussion after Theorem 7. 1 exposes the 
underlying reason for this phenomenon. 

3. Notation, conventions, and background 

This section contains a short overview of our notation, as well as some important facts from convex 
geometry that we will use liberally. Some standard references include Rockafellar [Roc70], Hiriart-Urruty & 
Lemarechal [HUL93a, HUL93b], and Rockafellar & Wets [RW98]. 

3.1. Vectors and matrices. We use boldface lowercase letters to denote vectors and boldface capital letters 
to denote matrices, so x is a vector and X is a matrix. We write I for the identity matrix and for the zero 
vector or matrix; their dimensions are determined by context. 

3.2. Euclidean geometry. For vectors x,yeU d , define the Euclidean inner product (x, y) and the squared 
Euclidean norm ||x|| 2 := (x, x). The Euclidean unit ball and unit sphere are respectively denoted as 

B d :={xeU d :\\x\\<l} and S d_1 := {xe R rf : = 1}. 

The group of d x d orthogonal matrices is defined as 

Q d :={U eU dxd :UU T = l\. 

An orthogonal basis for U d is an element of 0^. 
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3.3. Convex cones. Recall that a convex cone C is a set that is convex and positive homogeneous: C-tC 
for all t > 0. A convex cone is polyhedral if it can be written as the intersection of a finite number of closed 
halfspaces. We introduce the family ^ of all nonempty, closed, convex cones in U d . 

For a general convex cone C c U , the polar cone C° e ^ is the closed convex cone 

C° :={ueU d :{u, x)<0 forallxeC}. 

The normal cone jV{S,x) of a convex set S c U d at a point x e S consists of the outward normals to all 
hyperplanes that support S at x, i.e., 

JY{S,x):={ueM d :(u, y-x)<0 forallyeS}. (3.1) 

3.4. Representation of descent cones. Let / : U d — U be a proper convex function. Recall that the descent 
cone of / at a point x is given by 

2}(f,x):=[J{yeR d :f(x+Ty)<f{x)}. 

r>0 

The polar of a descent cone has some attractive properties that we will exploit later. First, the polar of a 
descent cone coincides with the normal cone of a sublevel set: 

&{f,x)° =JV{S,x) where S= {y e U d : f(y) < f(x)}. (3.2) 

Next, we introduce the subdifferential df(x), which is the closed convex set 

d/(x):={weR d :/(y)> /(*) + <«, y-x) for all y e U d }. 

In particular, the subdifferential contains the origin if and only if x minimizes /. Assuming that the 
subdifferential df(x) is nonempty, compact, and does not contain the origin, the result [Roc70, Cor. 23.7.1] 
provides that 

@(/»° = cone(d/(x)) := \Jj-df{x). (3.3) 

T>0 

The expression T-df{x) represents dilation of the subdifferential by a factor t. The relation (3.3) offers a 
powerful tool for computing the statistical dimension of a descent cone. Related identities hold under weaker 
technical conditions [Roc70, Thm. 23.7]. 

3.5. Euclidean projections onto sets. Let Scl rf bea closed convex set. The Euclidean distance to the set S 
is the function 

dist(-,S):K rf ^IR + where dist(jt,S) := inf{ ||x-y|| : y £ S}. 
The Euclidean projection onto the set S is the map 

: R -> S where ttsM := argmin{||x-y|| :y e S}. 

The projection takes a well-defined value because the norm is strictly convex. Let us note some properties 
of these maps. First, the function dist(-,S) is convex [Roc70, p. 34]. Next, the maps 7i$ an d I-^rs are 
nonexpansive with respect to the Euclidean norm [Roc70, Thm. 31.5 et seq.]: 

\\7t s (x)-7i s (y)\\<\\x-y\\ and Hl-Ji s ){x)-{l-Ji s ){y)\\<\\x-y\\ for all x,y e U d . (3.4) 

As a consequence, the projection its is continuous, and the distance function is 1-Lipschitz with respect to the 
Euclidean norm: 

|dist(jc,S)-dist(y,S)| < \\x-y\\ for all x,y e R d . (3.5) 
The squared distance is differentiable everywhere, and the derivative satisfies 

Vdist 2 (x,S) = 2(jt-:T S (Jc)) for all x e R d . (3.6) 
This point follows from [RW98, Thm. 2.26]. 
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3.6. Euclidean projections onto cones. Let C e ^ be a closed convex cone. Recall that the Euclidean 
projection onto the cone C is the map 

n c :K d ^C where n c (jc) := argmin{||jc-y|| :ye C}. 

We have used a separate notation for the projection onto a set because the projection onto a cone enjoys 
a number of additional properties [HUL93a, Sec. III. 3. 2]. First, the projection onto a cone is nonnegative 
homogeneous: Hc(tx) = tIIc(je) for all t > 0. Next, the cone C induces an orthogonal decomposition of R d . 

x=n c W + n c °M and <n c (jc), n c °(x)> = for all xeR d . (3.7) 

The decomposition (3.7) yields the Pythagorean identity 

||x|| 2 =||n c (x)|| 2 +||n C o(x)|| 2 forallxeK rf . (3.8) 

It also implies the distance formulas 

dist(x.C) = ||x-n c (jc)|| = ||n c °MI| for all jc e (3.9) 

The squared norm of the projection has a nice regularity property, which follows from a short argument based 
on (3.6), (3.7), and (3.9): 

V||n c (x)|| 2 = 2n c M for allxelR rf . (3.10) 

Finally, the projection map decomposes under Cartesian products. For two cones C\ e Sg^ and Cz £ the 
product Ci x C z e ^ rfl+d2 , and 

n Cl xc 2 ((*i»*2)) = (n Cl (j;i),nc 2 (JC2)) for allien?* andx 2 eU d2 . (3.11) 
The relation (3.11) is easy to check directly. 

3.7. Probability. The symbol P{-} denotes the probability of an event, and E[-] returns the expectation of 
a random variable. We reserve the letter g for a standard normal vector, i.e., a vector whose entries are 
independent normal variables with mean zero and variance one. We reserve the letter for a random vector 
uniformly distributed on the Euclidean unit sphere. The set 0^ of orthogonal matrices is a compact Lie group, 
so it admits an invariant Haar (i.e., uniform) probability measure. We reserve the letter Q for a uniformly 
random element of 0^, and we refer to Q as a random orthogonal basis. 

4. Calculating the statistical dimension 

Section 2 demonstrates that we can pinpoint the phase transitions in random linear inverse problems 
and random demixing problems as soon as we know the statistical dimension of the appropriate descent 
cones. The goal of this section is to show that we can obtain highly accurate approximation formulas for the 
statistical dimension of a cone with a modest amount of effort. 

Section 4.1 develops some basic properties of the statistical dimension that aid this investigation. The 
rest of the section explains how to compute the statistical dimension for several families of cones that 
arise in applications. This discussion includes a recipe for bounding the statistical dimension of a descent 
cone; Theorem 4.5 ensures that the recipe produces an accurate answer for many problems of interest. We 
summarize our calculations in Table 4.1 and in Figure 4.1. 

4.1. Basic facts about the statistical dimension. The statistical dimension has a number of valuable 
properties that are readily apparent from Definition 2.1. These facts provide useful tools for making 
computations, and they strengthen the analogy between the statistical dimension of a cone and the linear 
dimension of a subspace. 

Proposition 4.1 (Properties of statistical dimension). Let C e be a closed convex cone. The statistical 
dimension obeys the following laws. 

(1) Gaussian formulation. The statistical dimension is defined as 

<5(C):=E[||n c (g)ll 2 ] w/iereg~NORMAL(0,I d ). (4.1) 

(2) Spherical formulation. An equivalent definition states that 

S(Q := dE[||n c (0)ll 2 ] where 6 ~ UNiFORM(S d_1 ). (4.2) 
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(3) Rotational invariance. The statistical dimension does not depend on the orientation of the cone: 

S(UQ = 8(C) for each U e O d . (4.3) 

(4) Subspaces. For a subspace L c U d , the statistical dimension satisfies 8(L) = dim(L). 

(5) Polarity. The statistical dimension can also be expressed in terms of the polar cone: 

<5(C) = E[dist 2 (g,C°)]. (4.4) 

(6) Totality. The total statistical dimension of a cone and its polar equals the ambient dimension: 

8(C) + 8(C°) = d. (4.5) 
This generalizes the property dim(L) + dimf! 1 ) = dfor each subspace L c U d . 

(7) Direct products. For each cone K e 

8(CxK) = 8(C) + 8(K). (4.6) 
In particular, the statistical dimension is invariant under embedding: 

8{Cx{0 d >}) = 8{C). 

The relation (4.6) generalizes the rule dim(L x M) = dim(L) + dim(M)/or linear subspaces L and M. 

(8) Monotonicity. For each cone Ke^d, the inclusion CcK implies that 8(C) < 8(K). 

Proof. The Gaussian formulation in (4.1) simply repeats Definition 2.1. To derive the spherical formula- 
tion (4.2) from (4.1), we introduce the spherical decomposition g = R9, where R :- \\g\\ is a chi random 
variable with d degrees of freedom that is independent from the spherical variable 0. Use nonnegative 
homogeneity to draw the term R out of the projection and the squared norm, factor the expectation via 
independence, and note that E[R 2 ] = d. 

The rotational invariance property (4.3) follows immediately from the fact that a standard normal vector is 
rotationally invariant. 

To compute the statistical dimension of a subspace, note that the Euclidean projection of a standard normal 
vector onto a subspace has the standard normal distribution supported on that subspace, so its expected 
squared norm equals the dimension of the subspace. 

The polar identity (4.4) is a direct consequence of the distance formula (3.9), which implies that ||IIc(g)ll = 
dist(g, C°). Similarly, the totality law (4.5) follows from the Pythagorean identity (3.8). 

We obtain the direct product rule (4.6) from the observation (3.11) that projection splits over a direct 
product, coupled with the fact that projecting a standard normal vector onto each of two orthogonal subspaces 
results in two independent standard normal vectors. 

Finally, we verify the monotonicity law. Polarity reverses inclusion, so K° c C°. Using the polarity 
identity (4.4) twice, we obtain 

<5(C) = E[dist 2 (g,C°)] <E[dist 2 (g,.T)] =8{K). 
This completes the recitation. □ 

The statistical dimension also enjoys some deeper properties. We reserve these results until Section 5.3, 
which gives us time to introduce additional tools from integral geometry. 

4.2. Self-dual cones. We say that a cone C is self-dual when C° = -C. Self-dual cones are ubiquitous in the 
theory and practice of convex optimization. Here are three important examples: 

(1) The nonnegative orthant. The cone U d :- {x e U d : xi > for i = 1, . . . , d} is self-dual. 

(2) The second-order cone. The cone L d+1 := {(x,r) e U d+1 : \\x\\ < t} is self-dual. This example is 
sometimes called the Lorentz cone or the ice-cream cone. 

(3) Symmetric positive-semidefinite matrices. The cone §^ x " := {X e R"y£ : X 0} is self-dual, where 
the curly inequality denotes the semidefinite order. Note that the linear space R"y^ of n x n symmetric 
matrices has dimension \n{n+ 1). 

For a self-dual cone, the computation of the statistical dimension is particularly simple; cf. [CRPW12, Cor. 3.8]. 
The first three entries in Table 4.1 follow instantly from this result. 
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Cone 


Notation 


Statistical dimension 


Location 


The nonnegative orthant 


U d 


1 

id 


Sec. 4.2 


The second-order cone 


j^d+l 


\{d+\) 


Sec. 4.2 


Symmetric positive- 
semidefinite matrices 




\n{n+l) 


Sec. 4.2 


Circular cone in U d of angle a 


Circd(a) 


dsin 2 (a) + 0(l) 


Sec. 4.3 


Chambers of finite reflection 
groups acting on U d 


C A 
Cbc 


log(d) + 0(1) 
|log(d) + 0(l) 


Sec. 4.7 



Proposition 4.2 (Self-dual cones). Let Ce^ d be a self-dual cone. The statistical dimension <5(C) = \d. 

Proof. Just observe that 5(C) = \ [8{Q + S(C°)] - \d. The first identity holds because of the self-dual property 
of the cone and the rotational invariance (4.3) of statistical dimension. The second equality follows from the 
totality law (4.5). □ 

4.3. Circular cones. The circular cone Circd(a) in U d with angle < a < f is defined as 

Circd(a) := {xeU d :xi> \\x\\ cos(a)}. 

In particular, the cone Circ^(|) is isometric to the second-order cone L d . Circular cones have numerous 
applications in optimization; we refer the reader to [BV04, Sec. 4], [BTN01, Sec. 3], and [AG03] for details. 

We can obtain an accurate expression for the statistical dimension of a circular cone using trigonometry 
and basic asymptotic methods. 

Proposition 4.3 (Circular cones). The statistical dimension of a circular cone satisfies 

<S(Circ d (a)) = dsin 2 (a) + 0(l). (4.7) 

The error term is approximately equal to cos(2a). See Figure 4.1 [left] for a plot of (4.7). 

Turn to Appendix C.l for the proof, which seems to be original. Even though the formula in Proposition 4.3 is 
simple, it already gives an accurate approximation in moderate dimensions. 

4.4. A recipe for the statistical dimension of a descent cone. Theorems II and III allow us to locate the 
phase transition in a regularized inverse problem with random data. To apply these results, we must be able 
to compute the statistical dimension of a descent cone associated with the regularizer. In this section, we 
describe a method that delivers a superb upper bound for the statistical dimension of a descent cone. This 
technique is based on an elegant application of polarity. Stojnic [Sto09] developed the basic argument, which 
Chandrasekaran et al. [CRPW12] subsequently refined. We have undertaken some extra technical work to 
prove that the upper bound is accurate for the most important examples. 

Proposition 4.4 (The statistical dimension of a descent cone). Let f be a proper convex function. Assume that 
the subdifferential df(x) is nonempty, compact, and does not contain the origin. Then 

8{&{f,x)) < inf E[dist 2 (g,T-d/(x))l. (4.8) 

T>0 

The function on the right-hand side of (4.8), namely 

F:T~E[dist 2 (g,T-<3/(x))] for t > 0, (4.9) 
is strictly convex, continuous at t = 0, and differentiable for t > 0. It achieves its minimum at a unique point. 
Proof. We use the polarity relation (4.4) to compute the statistical dimension: 



<5(@(/») = E[dist 2 {g,®{f,xf)\ =E 



dist 2 (g,|jT-a/w) 

V r>0 / 



= Einf dist 2 (g,T-d/(*)). 
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Circular cones 



Descent cone of the £\ norm 



Descent cones of the Schatten 1-norm 








n/8 zt/4 3ir/4 jr/2 

a: Angle of cone 



1/4 1/2 3/4 



1/4 



1/2 



3/4 



p: Nonzeros/dimension 



p: Rank/dimension 



Figure 4.1: Asymptotic statistical dimension computations. In each panel, we take the dimensional param- 
eters to infinity, [left] Circular cones. The plot shows the normalized statistical dimension 8{-)/d of the circular 
cone Circ^a). [center] £\ descent cones. The curve traces the normalized statistical dimension 8{-)ld of the 
descent cone of the £\ norm on U d at a vector with [pd] nonzero entries, [right] Schatten 1-norm descent 
cones. The normalized statistical dimension 8{-)l{mn) of the descent cone of the Si norm on R mxn at a matrix 
with rank Ipm] for several fixed aspect ratios v = mln. As the aspect ratio v — 0, the limiting curve is p —> 2p-p 2 . 



The second identity follows from the fact (3.3) that, under our technical assumptions, the polar of the descent 
cone is the cone generated by the subdifferential. The third identity holds because the distance to a union is 
the infimal distance to any one of its members. To reach (4.8), we apply Jensen's inequality. See Lemma B.2 
in Appendix B.l for the proof of the remaining claims. □ 

Proposition 4.4 suggests a method, displayed as Recipe 4.1, for bounding the statistical dimension of a 
descent cone. In the next two sections, we use this approach to estimate the statistical dimension of the 
descent cone of the £\ norm at a sparse vector and the Schatten 1-norm at a low-rank matrix. 

The literature contains empirical evidence that Recipe 4. 1 leads to very accurate upper bounds. We have 
obtained an error estimate which shows that the recipe is precise for many important examples. 

Theorem 4.5 (Error bound for descent cone recipe). Let f be a norm on R d , and fix a nonzero point x. Then 



where the function F is defined in (4.9). 

The proof of Theorem 4.5 is technical in nature, so we defer the details to Appendix B.2. The application of 
this result requires some care because many different vectors x can generate the same subdifferential df(x) 
and hence the same descent cone Q){f,x). From this class of vectors, we ought to select one that maximizes 
the value /(x/ 

Remark 4.6 (Improved error bounds). We have established several variants of Theorem 4.5. For example, 
we can study arbitrary convex functions instead of norms if we move to an appropriate asymptotic setting. 
Unfortunately, we have not identified the optimal form for the error bound (4.10) in the general case. As 
such, we have chosen to present a simple result that justifies our analysis of phase transitions in £\ and Si 
minimization problems. 

4.5. Descent cones of the £\ norm. When we wish to solve an inverse problem with a sparse unknown, 
we often use the £\ norm as a regularizer; cf. (2.6), (2.10), and (2.11). Our next result summarizes the 
calculations required to obtain the statistical dimension of the descent cone of the £\ norm at a sparse vector. 
When we combine this proposition with Theorems II and III, we obtain the exact location of the phase 
transition for £\ regularized inverse problems whose dimension is large. 




(4.10) 
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Assume that / is a proper convex function on U d 

Assume that the subdifferential df(x) is nonempty, compact, and does not contain the origin 

(1) Identify the subdifferential S = df(x). 

(2) For each t > 0, compute Ffr) = E[dist 2 (g,TS)]. 

(3) Find the unique solution, if it exists, to the stationary equation F'(t) = 0. 

(4) If the stationary equation has a solution t*, then 5(@(/,x)) < F(t*). 

(5) Otherwise, the bound is vacuous: 8[3)(f,x)) < F(0) = d. 



Proposition 4.7 (Descent cones of the £\ norm). Let x be a vector in U d with s nonzero entries. Then the 
normalized statistical dimension of the descent cone of the £\ norm at x satisfies the bounds 



The function y : [0, 1] 



y/{s/d) - 
[0, 1] is defined as 



8[m\Hi,x)) 



sd 



yr(p):= inf \ p(l + t z ) + (1 - ph - 
t>o | V n 



d 



(1 + t z ) 



; y/{s/d). 



r 



-u z /2 



du- 



-T Z /2 



The infimum in (4.12) is achieved for the unique positive t that solves the stationary equation 



T J T 



-u'I2 



du = 



2 1- 



(4.11) 



(4.12) 



(4.13) 



See Figure 4.1 [center] for a plot of the function (4.12). 



Proposition 4.7 is a direct consequence of Recipe 4.1 and the error bound in Theorem 4.5. See Appendix C.2 
for details of the proof; Appendix A. 2 explains the numerical aspects. 

Let us emphasize the following consequences of Proposition 4.7. When the number s of nonzeros in the 
vector x is proportional to the ambient dimension d, the error in the statistical dimension calculation (4.11) 
is vanishingly small relative to the ambient dimension. When x is sparser, it is more appropriate to compare 
the error with the statistical dimension itself. Thus, 



8{2>{\\-\\i,x)) - d-y{sl d) 



-111,*)) 



^5(0(11-111 ,x)) 



when s> \fd+ 1. 



We have used the observation that ff(®(IHIi»*)) > 5-1, which holds because @(||-|| 1( jc) contains the (s- 1)- 
dimensional subspace parallel with the face of the (\ ball containing x. 

Aside from the first inequality in (4.11), the calculations and the resulting formulae in Proposition 4.7 are 
not substantially novel. Most of the existing analysis concerns the phase transition in compressed sensing, 
i.e., the £ i minimization problem (2.6) with Gaussian measurements. In this setting, Donoho [Don06b] and 
Donoho & Tanner [DT09a] obtained an asymptotic upper bound, equivalent to the upper bound in (4.11), 
from polytope angle calculations. Stojnic [Sto09] established the same asymptotic upper bound using a 
precursor of Recipe 4.1; see also Chandrasekaran et al. [CRPW12, App. C]. In addition, there are some 
heuristic arguments, based on ideas from statistical physics, that lead to the same result, cf. [DMM09a] 
and [DMM09b, Sec. 17]. Very recently, Bayati et al. [BLM12] have shown that, in the asymptotic setting, the 
compressed sensing problem undergoes a phase transition at the location predicted by (4.11). 

4.6. Descent cones of the Schatten 1-norm. When we wish to solve an inverse problem whose unknown is 
a low-rank matrix, we often use the Schatten 1-norm as a regularizer, as in (2.7) and (2.11). The following 
result gives a sharp asymptotic expression for the statistical dimension of the descent cone of the Si norm at a 
low-rank matrix. Together with Theorems II and III, this proposition allows us to identify the exact location of 
the phase transition for Si regularized inverse problems whose dimension is large. 
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Proposition 4.8 (Descent cones of the Si norm). Consider a sequence {X{r, m, ri)} of matrices where X{r, m, n) 
has rank r and dimension mx n with m< n. Suppose that r,m,n^oa with limiting ratios rim —■ p e (0, 1) and 
mln —■ v e (0, 1]. Then 

S[&[H Sl ,X(r,m,n))) 

— —^y/(p,v). (4.14) 



mn 

The function iff : [0, 1] x [0, 1] — [0, 1] is defined as 



y/{p, v) := inf -j pv + (1 - pv) 



p(l + t 2 ) + (1 - p) / (u-t) 2 -(p y {u)du 



(4.15) 



The quantity y := (v - pv)/(l - pv), and the limits of the integral are a± := 1 + y/y. The integral kernel <p y is a 
probability density supported on a+\: 



(p v [u):= \l {u 2 - a 2 )(a 2 - u 2 ) for ue[a-,a + ]. 

nyu v 

The optimal value o/t in (4.15) satisfies the stationary equation 

( + ("-lU y Md« = /-. (4.16) 

Ja-Vr^T I l-p 

See Figure 4.1 [right] for a visualization of the curve (4.15) as function of pfor several choices of v. The operator 
v returns the maximum of two numbers. 

See Appendix C.3 for a proof of Proposition 4.8. Appendix A.2 contains details of the numerical calculation. 

The literature contains several papers that, in effect, contain loose upper bounds for the statistical dimension 
of the descent cones of the Schatten 1-norm [RXH11, OKH11]. We single out the work [OHIO] of Oymak 
& Hassibi, which identifies an empirically sharp upper bound via a laborious argument. The approach here 
is more in the spirit of the upper bound in [CRPW12, App. C], but we have taken extra care to obtain the 
asymptotically correct estimate. 

4.7. Normal cones to permutahedra. We close this section with a more sophisticated example. The (signed) 
permutahedron generated by a vector x e U d is the convex hull of all (signed) coordinated permutations of the 
vector: 

2P{x) :- conv{cr(x) : a a coordinate permutation} (4.17) 

S i ±{x) := conv{cr ± (x) : a+ a signed coordinate permutation}. (4.18) 

(A signed permutation permutes the coordinates of a vector and gives each one an arbitrary sign.) Figure 4.2 
displays two signed permutahedra and the normal cone at a vertex. 

In this section, we present an exact formula for the statistical dimension of the normal cone of a permuta- 
hedron. In Section 9, we use this calculation to study an application in signal processing that was proposed 
in [CRPW12, p. 812]. 

Proposition 4.9 (Normal cones of permutahedra). Suppose that x has distinct entries. The statistical dimension 
of the normal cone at a vertex of the (signed) permutahedron generated by x satisfies 

8(jV(& > (x),x))=H d and 8{Jfl&> ± {x),xj) = ±H d , 

where := £f =1 i~ l is the dth harmonic number. 

The proof of Proposition 4.9 appears in Appendix C.4. The argument illustrates some deep connections 
between conic geometry and classical combinatorics. 

5. Tools from conic integral geometry 

To prove that the statistical dimension controls the location of phase transitions in random convex 
optimization problems, we rely on methods from conic integral geometry, the field of mathematics concerned 
with geometric properties of convex cones that remain invariant under rotations and reflections. Here are 
some of the guiding questions in this area: 

• What is the probability that a random unit vector lies at most a specified distance from a fixed cone? 

• What is the probability that a randomly rotated cone shares a ray with a fixed cone? 
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2?±{x) 
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Figure 4.2: Normal cone at the vertex of a permutahedron. The signed permutahedron Z?±(x) generated 
by [left] the vector x = (3,-1) and [right] the vector x = (3, -2). In each panel, the darker cone is the normal 
cone jY{£P+(x),x), and the lighter cone is its polar. Note that the normal cone does not depend on the generator 
x provided that the entries of x are distinct. 

The theory of conic integral geometry offers beautiful and precise answers, phrased in terms of a set of 
geometric invariants called conic intrinsic volumes. 

In the next section, we introduce the intrinsic volumes of a cone, we compute the intrinsic volumes of 
some basic cones, and we state the key facts about intrinsic volumes. Section 5.2 contains more advanced 
formulas from conic integral geometry, which are essential tools for identifying phase transitions. We revisit 
the statistical dimension in Section 5.3, where we develop some elegant new characterizations. 

The material in this section is adapted from the book [SW08, Sec. 6.5] and the dissertation [Amell]. The 
foundational research in this area is due to Santalo [San76, Part IV]. Modern treatments depend on the work 
of Glasauer [Gla95, Gla96]. In these sources, the theory is presented in terms of spherical geometry, rather 
than in terms of conical geometry. As noted in [AB12], the two approaches are equivalent, but the conic 
viewpoint provides simpler formulas and has key benefits that are only revealed through deeper structural 
investigations. 

5.1. Conic intrinsic volumes. We begin with the definition of the intrinsic volumes of a convex cone. 

Definition 5.1 (Intrinsic volumes) . Let C e ^ be a polyhedral cone. For each k = 0, 1,2, ...,d, the kth (conic) 
intrinsic volume v^{C) is given by 



As usual, g is a standard normal vector in U d . We extend this definition to a general closed convex cone in ^ 
by approximating it with a sequence of polyhedral cones. 

For polyhedral cones, Definition 5.1 gives an attractive intuition. We can decompose the ambient space into 
d + 1 disjoint regions, where the fcth region contains the points whose projection onto the cone lies in the 
relative interior of a fc-dimensional face; the fcth intrinsic volume reflects the proportion of the ambient space 
comprised by the fcth region. We see immediately that the sequence of intrinsic volumes forms a probability 
distribution on {0, 1,2,..., d}. The definition also delivers insight about several fundamental examples. 

Example 5.2 (Linear subspaces). Let Lj c U d be a j -dimensional subspace. Then Lj is a polyhedral cone with 
precisely one face, so the map Ilr. projects every point onto this j-dimensional face. Thus, 



vjciC) := P{llc(g) lies in the relative interior of a fc-dimensional face of C}. 
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Example 5.3 (The nonnegative orthant). The nonnegative orthant is a polyhedral cone, and Table 4.1 lists 
its statistical dimension as 5[K.f) -\d. The projection n R d(g) lies in the relative interior of a fc-dimensional 
face of the orthant if and only if exactly k coordinates of g are positive. Each coordinate of g is positive with 
probability one-half and negative with probability one-half, and the coordinates are independent. Therefore, 
the intrinsic volumes of the orthant are given by 



v k {ui) = T 



<d 



for fc = 0,l,2....,<± 



In other words, the intrinsic volumes coincide with the probability density of a Binomial^, |) random 
variable. From this representation, we learn that the largest intrinsic volume of the orthant occurs at the 
index k- [hd], and the large intrinsic volumes concentrate sharply around this index. We will prove that this 
type of behavior is generic! 

For a nonpolyhedral cone C e ^a, the projection formula in Definition 5.1 breaks down, and we can no 
longer interpret the fcth intrinsic volume in terms of the fc-dimensional faces of C. Furthermore, it takes some 
work to give sense to the phrase "approximation by polyhedral cones." We refer to the book [SW08, Sec. 6.5] 
or the thesis [Amell] for these important details. 

In spite of these caveats, the intrinsic volumes of an arbitrary closed convex cone still form a probability 
distribution. The next result quantifies this statement and several other basic relationships. 

Fact 5.4 (Properties of intrinsic volumes). Let Ce^dbea closed convex cone. The intrinsic volumes of the cone 
obey the following laws. 

(1 ) Distribution. The intrinsic volumes describe a probability distribution on {0, 1, ... , d}: 

d 

Y i v k {Q = l and v k {C)>Q for k = 0,1,2,..., d. (5.1) 

k=0 

(2) Polarity. The intrinsic volumes reverse under polarity: 

v k (C) = i/ rf -jfc(C°) for k = 0,1,2,..., d. (5.2) 

(3) Gauss-Bonnet formula. When C is not a subspace, 

£ v k {C)= £ v k {C) = \. (5.3) 

k=0 k=l Z 

k even k odd 

(4) Direct products. For each closed convex cone K £ 

v k (CxK)= £ Vi{C)-Vj{K) for k = 0,l,2,...,d+d'. (5.4) 

i+j=k 

The facts (5.1), (5.2), and (5.3) are drawn from [SW08, Sec. 6.5]. The product rule (5.4) appears in [Amell, 
Prop. 4.4.13]. 

Partial sums of intrinsic volumes play a central role in the kinematic theory of convex cones, described 
below. With an eye toward these developments, we make the following definition. 

Definition 5.5 (Tail functionals) . Let C e be a closed convex cone. For each k = 0, 1,2, . . ., d, the kth tail 
functional is given by 

d 

t k {C):=v k (.C)+v k+l (.C) + --= £ vj{C). (5.5) 



The fcth half-tail functional is defined as 



d 

h k (C):=v k (C) + v k+ 2(C) + -~= Y, vj[C). (5.6) 



- k even 



We require an interlacing inequality for tail functionals. 
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Proposition 5.6 (Interlacing). For each closed convex cone Ce^d that is not a linear subspace, 

2h k {C)>t k {C)>2h k+1 {C) for each k = 0,l,2,...,d-l. 
We establish Proposition 5.6 in Appendix D. 

5.2. The formulas of conic geometry. We continue with a selection of more sophisticated results from conic 
integral geometry. These formulas provide detailed answers, expressed in terms of conic intrinsic volumes, to 
the geometric questions posed at the beginning of Section 5. For this discussion, we introduce a family of 
geometric functions that also plays a key role in our analysis. 

Definition 5.7 (Tropic functions). Let L k be a fc-dimensional subspace of » d . Define 

/;*(£) :=P>{||n Lfc (0)|| 2 > £} for ££[0,1], (5.7) 
where is uniformly distributed on the unit sphere in U d . 

Basic geometric reasoning reveals that I d (£) is the proportion of points on the sphere S d_1 that lie within an 
angle arccosCy 7 ^) of the subspace L k . Our terminology derives from the approximate geographical fact that 
the tropics lie within a fixed angle (23° 26') of the equator; the usual term regularized incomplete beta function 
is longer and less evocative. 

The core fact in conic integral geometry is the spherical Steiner formula [Her43, A1148, San50], which 
describes the fraction of points on the sphere that lie at most a fixed angle from a closed convex cone. 

Fact 5.8 (Spherical Steiner formula). Let Ce^^be a closed convex cone. For each e e [0, 1], 

d 

P{||n c (0)|| 2 >£}= £ v^Qlfe). (5.8) 

The spherical Steiner formula often serves as the definition of conic intrinsic volumes; it can also be derived 
from the definition here. For a modern proof of Fact 5.8 in the spirit of this work, see [SW08, Thm 6.5.1]. 

The second essential result is the conic kinematic formula, which provides an exact expression for the 
probability that a randomly oriented cone strikes a fixed cone. 

Fact 5.9 (Conic kinematic formula). Let C,K £ be closed convex cones, and assume that C is not a subspace. 
Then 

P{CnQK*{Q]} = 2h d+1 (.CxK}. (5.9) 
For a linear subspace L d - m <= R d with dimension d-m, this expression reduces to the Crofton formula 

P{CnQI rf _ m ^{0]} = 2/i m+ i(C). (5.10) 

See [SW08, p. 261] for a proof of Fact 5.9. In Section 7, we use deep properties of the intrinsic volumes to 
produce an approximate version of the conic kinematic formula, which ultimately delivers detailed information 
about phase transitions. 

Remark 5.10 (Extended kinematic formula). By induction, the kinematic formula generalizes to a family 
C,K\,...,K r £ <€ d of closed convex cones where C is not a subspace: 

¥>{Cn QiK! n • • • n Q r K r jt {0}} = 2 h rd+1 (C x Xi x • • ■ x JC r ). (5.11) 

Each matrix Qi is an independent, random orthogonal basis. This result can be used to analyze demixing 
problems with more than two constituents. 

5.3. Revisiting the statistical dimension. The machinery of conic integral geometry offers several new 
insights about the statistical dimension, and it underscores the analogy with the dimension of a linear 
subspace. The next proposition demonstrates that the statistical dimension of a cone is the mean of the 
random variable on {0, 1, 2, . . ., d] whose distribution is given by the intrinsic volumes of the cone. This fact 
motivated us to select the terminology "statistical" dimension. 

Proposition 5.11 (Statistical dimension as mean intrinsic volume). For each closed convex cone C £ ^€ d , 

d 

5(C) = £ kv k lQ. (5.12) 
fc=i 
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Proof. The spherical formulation (4.2) of statistical dimension shows that 

.5(C) = d E [ || n c (0) II 2 ] = d [ 1 P{ ||II C (0) || 2 > e} de. 

Jo 

We have used integration by parts to express the expectation as an integral of tail probabilities. The Steiner 
formula (5.8) and the definition (5.7) of the tropic function allow us to write the probability as a sum: 

<5(C) = d£ v k {C) (V{iini,(0)ii 2 >£}d£= £ v k (C){dE[\\n Lk m\ 2 ]), 

fc=i J ° fc=i 

where L k is an arbitrary fc-dimensional subspace. A second application of (4.2) shows that the parenthesis is 
the statistical dimension of L k , and Proposition 4.1(4) provides that 8(L k ) = k. □ 

Proposition 5.11 has a significant consequence. Each intrinsic volume v k is a valuation on so the 
statistical dimension is also a valuation on S^. Equivalently (i) the statistical dimension of the trivial cone 
8 ({0}) = 0, and (ii) if C, K e and C u K e S^ rf , then we have the inclusion-exclusion rule 

8(CuK) = 8(C) + 8(IQ-8(CnK). 

This property is analogous with the inclusion-exclusion law for the dimension of a subspace. 

The long-standing spherical Hadwiger conjecture posits that each continuous, rotation-invariant valuation on 
can be written as a linear combination of the conic intrinsic volumes. If this conjecture holds, then every 
such valuation is determined by its values on linear subspaces. Under this surmise, we obtain a fundamental 
characterization of the statistical dimension. 

Proposition 5.12 (Statistical dimension is canonical). If the spherical Hadwiger conjecture holds, then the 
statistical dimension 8 is the unique continuous, rotation-invariant valuation on that satisfies 8(L) = dim(L) 
for each subspace L c R d . 

A weaker version of the spherical Hadwiger conjecture does hold for geometric parameters known as curvature 
measures, which provides a rigorous claim that the statistical dimension is canonical under an additional 
technical assumption of "localizability"; see [SW08, p. 254 and Thm. 6.5.4]. For a discussion of the spherical 
Hadwiger conjecture, see the works [McM93, p. 976], [KR97, Sec. 11.5], and [SW08, p. 263]. The conjecture 
currently stands open for d > 4. 

6. Intrinsic volumes concentrate near the statistical dimension 

The main technical result in this paper describes a deep new property of conic intrinsic volumes. The 
intrinsic volumes of a cone concentrate near the statistical dimension of the cone on a scale determined by 
the statistical dimension. 

Theorem 6. 1 (Concentration of intrinsic volumes) . Let C be a closed convex cone. Define the transition width 

a)(C) := y/8(C) a8(C°), 

and introduce the function 

p c (A):= 4 expf ~*_ 18 ) for A>0. (6.1) 
V oj z (C) + A I 

Then 

fc_<<5(C)-A+l =^ f fc _(C)>l-p c (A); (6.2) 
fc + >5(C) + A => t k+ {C)< pcW. (6.3) 
The tail functional t k is defined in (5.5). The operator a returns the minimum of two numbers. 

In other words, the sequence {t k {C) : k = 0, 1,2, ...,d] of tail functionals drops from one to zero near the 
statistical dimension 8(C), and the transition occurs over a range of 0(co{C)) indices. Owing to the fact (5.1) 
that the intrinsic volumes form a probability distribution, we must conclude that the intrinsic volumes v k (C) 
are all negligible in size, except for those whose index k is close to the statistical dimension 8(C). We learn 
that the intrinsic volumes of a convex cone C with statistical dimension 8(C) are qualitatively similar to the 
intrinsic volumes of a subspace with dimension [8(C)]. 
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Theorem 6.1 contains additional information about the rate at which the tail functional of a cone transit 
from one to zero. To extract this information, it helps to note the weaker inequality 



4e -A'/(W(C) )( <A<w 2 (C) 
4e" A/16 , A>w 2 (C). 



PC(A)<^__ A/16 . (6-4) 



We see that (6.2) and (6.3) are vacuous until A a 4w(C). As A increases, the function pcW decays like the tail 
of a Gaussian random variable with standard deviation < 2\/2a>(C). When A reaches co 2 (C), the decay slows 
to match the tail of an exponential random variable with mean < 16. In particular, the behavior of the tail 
functionals depends on the intrinsic properties of the cone, rather than the ambient dimension. 

In Section 6.1, we outline some connections between Theorem 6.1 and classical inequalities from Euclidean 
integral geometry. In Section 6.2, we argue that the result is nearly optimal. Afterward, in Section 6.3 and 6.4, 
we summarize the intuition behind the proof of Theorem 6.1, and we follow up with the technical details. 
Later, Sections 7-9 highlight applications in conic geometry, optimization theory, and signal processing. 

6.1. Parallels with Euclidean integral geometry. Motivated by analogies between conic and Euclidean 
integral geometry, Schneider & Weil [SW08, p. 263] ask what relationships hold among the conic intrinsic 
volumes v k of a convex cone. To the best of our knowledge, the literature contains only two nontrivial 
results [GHS02]: Among all cones in with a fixed value of v^, a circular cone minimizes v^-i and 
maximizes vq. This fact is a consequence of spherical isoperimetry 

Our work provides a rich family of inequalities relating conic intrinsic volumes and tail functionals. For any 
convex cone C, Theorem 6. 1 implies that 

v k {C) < s k (C) := exp ( „ ^ — ^ ) for each fc=0,l,2,...,d. 

This bound also relies on the trivial inequalities v k {C) < t k {C) and v k (C) < 1- t k+ i{C). Observe that the 
sequence {s k {Q : k = 0, 1,2, ...,d] of upper bounds is strictly log-concave: 

s\ (C) > s k _ j (C) • s k+ ! (C) for k- 1,2,3,. ...rf- 1. 

This point follows from the fact that u u 2 /[cd 2 (C) + \u\) is strictly convex for every nontrivial convex cone 
C. (Indeed, the Gauss-Bonnet formula (5.3) implies that \ < 6{C) < d — \, which ensures that w 2 (C) > |.) We 
see that the sequence of conic intrinsic volumes is dominated by a log-concave sequence; cf. the conjecture 
in [Amell, Conj. 4.4.16]. 

The Euclidean intrinsic volumes of a convex body also form a log-concave sequence, which is a corollary of 
the Alexandrov-Fenchel inequalities. Log-concavity of the Euclidean intrinsic volumes has many fundamental 
consequences, including the usual isoperimetric inequality for convex bodies, the Brunn-Minkowski inequality, 
and the Urysohn inequality [Sch93, Chap. 6]. Theorem 6.1 delivers a system of inequalities for conic intrinsic 
volumes that parallel these deep classical results. 

6.2. Optimality of Theorem 6.1. By considering circular cones, we discover that Theorem 6.1 is nearly 
optimal. Let us proceed with a heuristic discussion that conveys the key ideas. Consider the cone C = Circd(a). 
For simplicity, we assume that d = 2{n+ 1) for a large integer n, and we abbreviate q = sin 2 (a). Proposition 4.3 
shows that the statistical dimension 5(C) ~ dsin 2 (a) a 2nq and S(C°) ~ 2n{\ - q). According to [Amell, 
Ex. 4.4.8], the odd intrinsic volumes of C satisfy 



1 

v 2 k+\{Q = - 



(J 



q k a-q) n ~ for k = 0,1,2,. ..,n. 



In other words, 2 v 2k +i (Q = P"{^ = k] where X ~ Binomial^, q). This observation invites us to study these 
cones using probabilistic methods. 

First, we approximate the width of the region over which the tail functionals of C change from one to zero. 
The interlacing result, Proposition 5.6, indicates that 

;i 

t 2k {C) ~ 2 h 2k+l (C) = 2 £ v 2i+ i(Q = P{X > k}. 

i-k 
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Under appropriate assumptions on q, k, and n, the Central Limit Theorem implies 

, X-nq k-nq I [ k-nq 

¥>{X>k) = V>\ - = > — — \~¥>\Z> 



yjnq{\-q) sjnq{\ - q) J I y/nq(l-q) 
where Z is a standard normal variable. For the index 2k - 2nq + X ^ 8{C) + X, we see that 

f A } I -A 2 /4 \ / -A 2 /4 \ r 

f2fc(C) ~ P < Z > — - la exp w exp for A » 0. 

I 2y/nqa-q)\ \2nq{\-q)) ^\{{\-q)8{C))A{q8{C°))) 

The second approximation follows from the normal tail estimate P{Z > z} « e~ z2/2 . Similarly, for the index 

2k = 2nq -Aa 5(C) - A, we have 

f 2 fc(C) ~ 1-exp forA»0. 

In other words, the width of the transition region for the tail functionals of the circular cone C really does 
have order w(C). Furthermore, considering the case where q is close to zero, we discover that the constant in 
the exponent of (6.1) lies within a factor two of the best possible. 

To argue that pc cannot have subgaussian decay when A is large relative to the statistical dimension 8{C), 
we consider a Poisson limit of the binomial variable. Suppose that q- bin for a constant b, so the statistical 
dimension 5(C) ~ 2b. For 2k = 2b (1 + A) « (1 + A) 5(C), 

The tail estimate follows, for example, by applying Cramer's Theorem [DZ10, Thm. 2.2.3] to the binomial 
random variable X. In other words, for a very small circular cone, the tail functionals decay only a little faster 
than subexponential when the tail index k is a multiple of the statistical dimension. 

6.3. Heuristic proof of Theorem 6.1. The basic ideas behind the argument are easy to summarize, but the 
details demand some effort. Let C e be a closed convex cone. Recall the Steiner formula (5.8): 

d 

P{||n c (0)ll 2 >£}= £ v k {C)lf{e), (6.5) 

k=0 

where is uniformly distributed on the sphere S rf_1 and the tropic function l£ is defined in (5.7). 

Concentration of measure on the sphere implies that the random variable ||IIc(0)l| 2 is typically very close 
to its expected value 8{C)ld, determined by (4.2). Thus, the left-hand side of (6.5) is very close to one when 
ed < 5(C) and very close to zero when ed > 5(C). 

As for the right-hand side of (6.5), recall that the tropic function is the proportion of points on the 
sphere within a distance of Vl-£ from a fixed fc-dimensional subspace. Once again, concentration of measure 
ensures that is close to zero when k< ed and close to one when k > ed. Therefore, the sum on the 
right-hand side of (6.5) is approximately equal to the tail functional t El i(C). 

Combining these two observations, we conclude that the sequence { tt (C) : k = 0, 1, 2, . . . , d\ of tail functionals 
makes a sharp transition from one to zero when k « 5(C). It remains to make this reasoning rigorous and to 
determine the range of k over which the transition takes place. 

6.4. Proof of Theorem 6.1. Let Ce^ be a closed convex cone, and define e:= k+/d. The first part of the 
argument requires a technical lemma that we prove in Appendix D. This result quantifies how much of the 
sphere in U d lies within an angle arccos(fc/rf) of a fc-dimensional subspace. 

Lemma 6.2 (The tropics). For all integers 0<k<d, the tropic function l£(kld) > 0.3. 

We begin by expressing the tail functional t k+ (C) in terms of the probability that a spherical variable lies near 
the cone C. 

d 

tk + (Q= L v k (C)[4(c) + (l-I'*{e))] 

k-k+ 

sf>t(C)i£(e) + (l-i£(e)) £ v k {C) 

k=0 k=k+ 

< P{ ||n c (0)ll 2 > 4 + 0.7 t k+ (C). (6.6) 
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The first identity is the definition (5.5) of the tail function. To reach the second inequality, we inspect the 
definition (5.7) to see that I d {e) is a decreasing function of k when the other parameters are fixed. In the 
third line, we invoke the Steiner formula (5.8) to rewrite the first sum. The bound for the second sum follows 
and the definition (5.5) of the tail functional and from Lemma 6.2 seeing that e = k + Id. 
Rearranging (6.6), we obtain the bound 

f fc+ (C) < 4P{d ||n c (0)|| 2 > de} < 4P{d ||II c (0)|| 2 > 8{C) + A}. (6.7) 

The last inequality depends on the fact that e- k + /d and the definition (6.3) of k+. In other words, the tail 
functional is dominated by the probability that a random point on the sphere is close to the cone. 

To estimate the probability in (6.7), we need a tail bound for the squared norm of the projection of a 
spherical variable onto a cone. This result is encapsulated in the following lemma. The approach is more or 
less standard, so we defer the details to Appendix D. 

Lemma 6.3 (Tail bound for conic projections). For each closed convex cone C e ^d, 

P{d ||n c (fl)ll 2 > S(Q + A} < exp ( ~* /B ) /orA>0. (6.8) 

Introducing (6.8) into (6.7), we reach the upper bound (6.3) on the tail functional. 

To develop the lower bound (6.2) on the tail functional (C), we use a polarity argument. Note that 

d d-k- 

t k JQ= £ v k(C)= E ffc(C°) = l-frf-k_ + i(C°). (6.9) 

fc=fc_ k=0 

The first identity is the definition (5.5) of the tail functional (C). The second relation holds because of the 
fact (5.2) that polarity reverses intrinsic volumes, and the last part relies on (5.5) and the property (5.1) that 
the intrinsic volumes sum to one. Owing to the totality law (4.5) and the definition (6.2) of 

d - fc_ + 1 = 8{C°) + 8{Q - fc_ + 1 > 8{C°) + A. 

Therefore, we may apply (6.3) to obtain an upper bound on the tail functional f^-fc _+i(C°). Substitute this 
bound into (6.9) to establish the lower bound on the tail functional tjc _(C) stated in (6.2). 

7. Approximate kinematic bounds 

We are now prepared to establish an approximate version of the conic kinematic formula, expressed in 
terms of the statistical dimension. Most of the applied theorems in this paper ultimately depend on this result. 
The proof combines the exact kinematic formula (5.9) with the concentration of intrinsic volumes, guaranteed 
by Theorem 6.1. 

Theorem 7.1 (Approximate kinematics). Assume that A > 0. Let CcU d be a convex cone that is not a subspace. 
For a {d- m)- dimensional subspace Ld- m , it holds that 

m>5(C) + A => P{CnQL d - m ^{0}}< p c U); 

(7.1) 

m<5(C)-A => P{CnQL rf _ m ^{0}}>l-p c (A). 
For an arbitrary convex cone K c U d , it holds that 

8{C) + 8{K)<d-2X ¥>{CnQK^{0}}< PcW + PkW; 

(7.2) 

8(C) + 8(K) > d + 2X ^> ¥>{CnQK^{0}}>l-(p c {A) + p K a)). 
The functions pc and pk are defined by the expression (6.1). 

Theorem 7.1 has an attractive interpretation. The first statement (7.1) shows that a randomly oriented 
subspace with codimension m is unlikely to share a ray with a fixed cone C, provided that the codimension 
m is larger than the statistical dimension 5(C) of the cone. When the codimension m is smaller than the 
statistical dimension 8{C), the subspace and the cone are likely to share a ray. 

The transition in behavior expressed in (7.1) takes place when the codimension m of the subspace changes 
by about w(C) = y/SiC] a 8(C°). This point explains why the empirical success curves taper in the corners of the 
graphs in Figure 2.2. Indeed, on the bottom-left side of each panel, the relevant descent cone is small; on the 
top-right side of each panel, the descent cone is large, so its polar is small. In these regimes, the result (7.1) 
shows that the phase transition must occur over a narrow range of codimensions. 
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The second statement (7.2) provides analogous results for the probability that a randomly oriented cone 
shares a ray with a fixed cone. This event is unlikely when the total statistical dimension of the two cones 
is smaller than the ambient dimension; it is likely to occur when the total statistical dimension exceeds the 
ambient dimension. 

For the case of two cones, it is harder to analyze the size of the transition region. Since the probability 
bounds in (7.2) are controlled by the sum pcW + PkW, we can only be certain that the probability estimate 
depends on the larger of the two quantities. It follows that the width of the transition does not exceed the 
larger of «(C) = VS{C) aS{C°) and a){K) = y/S(K) aS(K°). This observation is sufficient to explain why the 
empirical success curves taper at the top-left and bottom-right of the graphs in Figure 2.3. Indeed, these are 
the regions where one of the descent cones is small and the polar of the other descent cone is small. 

In the next subsection, we proceed with the proof of Theorem 7.1. Afterward, we derive Theorem I from a 
similar, but slightly easier argument. 

7.1. Proof of Theorem 7.1. We may assume that C and K are both closed because PjCn QK ^ {0}} = 
P{Cn QK^ {0}}, where the overline denotes closure. This is a subtle point that follows from the discussion of 
touching probabilities located in [SW08, pp. 258-259]. 

Let us begin with the first set (7.1) of results, concerning the probability that a randomly oriented 
subspace strikes a fixed cone. Consider the first implication, which operates when m > 5(C) + A. The Crofton 
formula (5.10) shows that 

P{CnQL?{0}} = 2h m+1 (Q<t m (Q > 

where the inequality depends on the interlacing result, Proposition 5.6. The concentration of intrinsic volumes, 
Theorem 6.1, demonstrates that the tail functional satisfies the bound 

t m {C)<p c U) when m>5(C) + A. 

This completes the first bound. The second result, which holds when m < 5(C) - A, follows from a parallel 
argument. 

The conic kinematic formula is required for the second set (7.2) of results, which concern the probability 
that a randomly oriented cone strikes a fixed cone. Consider the situation where 5(C) + 8{K) < d - A. The 
kinematic formula (5.9) yields 

P{CnQK?{0}} = 2h d+1 (CxK)< t d {CxK), (7.3) 

where the inequality follows from Proposition 5.6. 

We rely on a simple lemma to bound the tail functional of the product in terms of the individual tail 
functionals. The proof appears in Appendix D. 

Lemma 7.2 (Tail functionals of a product). Let C and K be closed convex cones. Then 

t\S(C)+S(K)+2X] (C x K)< t[S(Q+x\ (C) + ffSyq+Al C*0- 

Since the tail functionals are weakly decreasing, our assumption that 5(C) + 6{K) < d- A implies that 

tdiC x K) < ttf(C)+S(K)+2X\ [CxK)< f[5(C)+Al (Q + t\S(K)+X\ (K)- 

Theorem 6.1 delivers an upper bound of pcW + pkW for the right-hand side. Introduce these bounds into 
the probability inequality (7.3) to complete the proof of the first statement in (7.2). The second result follows 
from an analogous argument. 

7.2. Proof of Theorem I. The simplified kinematic bound of Theorem I involves an argument similar with 
the proof of Theorem 7.1. First, assume that S(C) + 8{K) < d-A. The product rule (4.6) for cones states that 
S(C xK) = 5(C) + S(K), so the implication (6.3) in Theorem 6.1 yields 

r d (Cx^)<p CxJC (A)<4exp[^^J<4e- A2/(16d) forO<A<d. (7.4) 

The second relation holds because the totality rule (4.5) ensures that 6{CxK)< d or 5((C x K)°) < d. Substitute 
the inequality (7.4) into the kinematic bound (7.3). Then make the change of variables A—- a^Vd, where 
:- 4 v / log(4/?7), to obtain the estimate 

P{CnQi<:^{0}}<77. 

This establishes the first part of Theorem I. The argument for the second part is cut from the same pattern. 
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Figure 8.1: Phase transitions in random cone problems. For each cone Q in (8.2), we plot the empirical 
probability that the cone program (8.1) with random affine constraints is feasible. The solid gray curve traces 
the logistic fit to the data, and the finely dashed line is the empirical 50% success threshold, computed from 
the regression model. The coarsely dashed line marks the statistical dimension 5(Cj), which is the theoretical 
estimate for the location of the phase transition. 



8. Application: Cone programs with random constraints 

The concentration of intrinsic volumes has far-reaching consequences for the theory of optimization. This 
section describes a new type of phase transition phenomenon that appears in a cone program with random 
affine constraints. We begin with a theoretical result, and then we exhibit some numerical examples that 
confirm the analysis. 

8.1. Cone programs. A cone program is a convex optimization problem with the following structure: 

minimize (it, x) subject to Ax-b and xeC, (8.1) 

where C e ^ is a closed convex cone. The decision variable xeU d , and the problem data consists of a vector 
u e U d , a matrix A e U mxd , and another vector b e U m . This formalism includes several fundamental classes of 
convex programs: 

(1) Linear programs. If C = U d , then (8.1) reduces to a linear program in standard form. 

(2) Second-order cone programs. If C- L d+1 , then (8.1) is a type of second-order cone program. 

(3) Semidefinite programs. When C = S" x ", we recover the class of (real) semidefinite programs. 

In addition to their flexibility and modeling power, cone programs enjoy effective algorithms and a crisp 
theory. We refer to [BTN01] for further details. 

The cone program (8.1) can exhibit several interesting behaviors. Let us remind the reader of the 
terminology. A point x that satisfies the constraints Ax = b and x e C is called a feasible point, and the cone 
program is infeasible when no feasible point exists. The cone program is unbounded when there exists a 
sequence {x^} of feasible points with the property (it, x^) -» -oo. 

Our theory allows us analyze the properties of a random cone program. It turns out that the number m of 
affine constraints controls whether the cone program is infeasible or unbounded. 

Theorem 8.1 (Phase transitions in cone programming). Let Ce^ d be a closed convex cone. Consider the cone 
program (8.1) where the vector b^O is fixed while the vector ueU d and the matrix A e u mxd have independent 
standard normal entries. Then 

m < 6(C) - A => (8.1) is unbounded with probability > 1 - pcW; 
m > 5(C) + A => (8.1) is infeasible with probability > 1 - pc(A). 

The function pc is defined by the expression (6.1). 
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Table 8.1: Empirical versus theoretical phase transitions. For each cone Q listed in (8.2), we compare the 
theoretical location of the phase transition, equal to the statistical dimension <5(Q), with the empirical location, 
computed from the logistic regression model in Figure 8.1. The last column lists the errors, relative to the 
dimension d - 396 of the problem. 



Cone 


8(d): Theoretical 


HC. Empirical 


\8(d) -Mil Id: Error 


d 


66.67 


66.88 


0.054% 


c 2 


51.00 


51.70 


0.177% 


d 


35.50 


36.06 


0.141% 



Proof. Amelunxen & Burgisser [AB12, Thm. 1.3] have shown that the intrinsic volumes of the cone C control 
the properties of the random cone program (8.1): 

P{(8.1) is infeasible} = 1 - t m (C); 

P{(8.1) has a unique minimizer} = v m (C); 

P{(8.1) is unbounded} = t m+1 (C). 

We apply Theorem 6.1 to see that the tail functional t m+ \(C) is extremely close to one when the number m of 
constraints is smaller than the statistical dimension 8(C). Likewise, t m (C) is extremely close to zero when the 
number m of constraints is larger than the statistical dimension. We omit the details, which are analogous 
with the proof of Theorem 7.1. □ 

8.2. A numerical example. We have conducted a computer experiment to compare the predictions of 
Theorem 8. 1 with the empirical behavior of a generic cone program. For this purpose, we study some random 
second-order cone programs. In each case, the ambient dimension d = 396, and we consider three options for 
the cone C in (8.1): 

d :=Circ rf (o:i); (8.2a) 
C 2 := Cuc d/2 (a 2 ) x Circ d/2 (a 2 ); (8.2b) 
C 3 := Circ rf/3 (a!3) x Circ rf/3 (a! 3 ) x Circ rf/3 (a!3). (8.2c) 

The angles satisfy tan 2 («i) = | and tan 2 (a 2 ) = j and tan 2 (a3) = yy. Using the product rule (4.6) and the 
integral expression (C.l) for the statistical dimension of a circular cone, numerical quadrature yields 

8(d) ~ 66.67; 8(d) ~ 51.00; 8(d) ~ 35.50. 

Theorem 8.1 indicates that a cone program (8.1) with the cone Q and generic constraints is likely to be 
feasible when the number m of affine constraints is smaller than 8(d); it is likely to be infeasible when the 
number m of affine constraints is larger than 8(d). 

We can test this prediction numerically. For each i = 1,2,3 and each m e {1,2,3,..., we perform the 

following steps 50 times: 

(1) Independently draw a standard normal matrix A e u mxd and standard normal ueU d and b e R m . 

(2) Use the Matlab package CVX to solve the cone program (8.1) with C= Q. 

(3) Report failure if CVX declares the cone program infeasible. 

For each i = 1,2,3, Figure 8.1 displays the empirical success probability, along with a logistic fit (Appendix A.3). 
We also mark the theoretical estimate for the location of the phase transition, which is equal to the statistical 
dimension 8(d). Table 8.1 reports the discrepancy between the theoretical and empirical behaviors. 

9. Application: Vectors from lists? 

This section describes a situation where our results prove that a particular linear inverse problem does not 
provide an effective way to recover a structured vector. Indeed, a significant contribution of our theory, which 
has no parallel in the current literature, is that we can obtain negative results as well as positive results. 

In [CRPW12, Sec. 2.2], Chandrasekaran et al. propose a method for recovering a vector from an unordered 
list of its entries, along with some linear measurements. Here is one way to frame this problem. Suppose 
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that xq e K d is an unknown vector. We are given the vector yo-x^, whose entries list the components of xq in 
weakly decreasing order. We also collect data z = Ax where A is an m x d matrix. To identify x , we must 
solve a structured linear inverse problem. 

To solve this problem, Chandrasekaran et al. propose to use a convex regularizer / that exploits the 
information in the vector yo- They consider the Minkowski gauge of the permutahedron generated by yo 

||*|| 0> (yo) := inf {t > : x e t ^(y )}, 

and they frame the regularized linear inverse problem 

minimize |]x|| ^ (J , o) subject to zq - Ax. (9.1) 

It is natural to ask how many linear samples we need to be able to solve this inverse problem reliably. Our 
theory allows us to answer this question decisively when the measurements are random. 

Proposition 9. 1 (Vectors from lists?) . Let xq eU d be a fixed vector with distinct entries. Suppose we are given 
the data y = x Q and z a - Ax , where the matrix A e u m>td has standard normal entries. In the range < A < \fHd, 
it holds that 

m<d-U d -A\/Hd => (9.1) succeeds with probability < 4e" A2/16 ; 

m> d-H d + Ay / Hrf ==> (9-1) succeeds with probability > 1 - 4e" A2/16 . 

The dth harmonic number satisfies logd < < 1 + logrf. 

Proposition 9.1 yields the depressing assessment that we need a near-complete set of linear measurements to 
resolve our uncertainty about the ordering of the vector. Nevertheless, we do not need all of the measurements. 
It would be interesting to understand how much the situation improves for vectors with many duplicated 
entries. 

Proof. This result follows from Fact 2.6 and the kinematic bound (7.1) in Theorem 7.1 as soon as we compute 
the statistical dimension of the descent cone of the regularizer ll-ll^(y ) at the point xq. By construction, the 
unit ball of IHI^»(j, ) coincides with the permutahedron ^(yo), which equals £^(x ) by permutation invariance. 
Therefore, 

The second identity follows from (3.1). See Figure 4.2 for an illustration of the corresponding facts about 
signed permutahedra. To compute the statistical dimension, we apply the totality law (4.5) to see that 

8{2>tH&»( yo ),xo)) = d-6{Jf{@>{.x Q ),x )) = d-U d , 

where the second relation follows from Proposition 4.9. Apply the kinematic result (7.1) for subspaces, and 
invoke (6.4) to simplify the error bound pcW- □ 

Remark 9.2 (Signed vectors). The same negative results hold for the problem of reconstructing a general 
vector from an unordered list of the magnitudes of its entries, along with some linear measurements. In 
this case, the appropriate regularizer is the Minkowski gauge of the signed permutahedron. We can use 
Proposition 4.9 to compute the statistical dimension of the descent cone. For a d-dimensional vector with 
distinct entries, we need about d- \Hd random measurements to succeed reliably. 

9.1. A numerical example. We present a computer experiment that confirms our pessimistic analysis. Fix 
the ambient dimension d = 100. Set xq - (1,2, 100) and yo = xL For each m = 85,86, 100, we repeat the 
following procedure 50 times: 

(1) Draw a matrix A e u mxd with independent standard normal entries, and form zq = Axq. 

(2) Use the Matlab package CVX to solve the linear inverse problem (9.1). 

(3) Declare success if the solution x satisfies ||x- Xoll < 10~ 5 . 

Figure 9.1 displays the outcome of this experiment. As usual, the phase transition predicted at the statistical 
dimension d - is very close to the empirical 50% mark, which we obtain by performing a logistic regression 
of the data (see Appendix A. 3). 
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Probability of finding a vector from a list 




Number of random measurements 

Figure 9.1: Vectors from lists? The empirical probability that the convex program (9.1) correctly identifies a 
vector xo in K 100 with distinct entries, provided an unordered list yo of the entries of xo and m random linear 
measurements zq - Axq. The solid gray curve marks the logistic fit to the data. The midpoint of the logistic curve 
\i = 95.17 (finely dashed line), while the theory predicts a phase transition at the statistical dimension 8 = 94.81 
(coarsely dashed line). The error relative to the dimension = 0.36%. 

10. Related work 

To conclude the body of the paper, we place our work in the context of the literature on geometric analysis 
of random convex optimization problems. We trace four lines of thought on this subject. The first draws from 
the theory of polytope angles; the second involves conic integral geometry; the third is based on comparison 
inequalities for Gaussian processes; and the last makes a connection with statistical decision theory. Our 
results have some overlap with earlier work, but our discovery that the sequence of conic intrinsic volumes 
concentrates at the statistical dimension allows us to resolve several subtle but important questions that have 
remained open until now. 

10.1. Asymptotic polytope-angle computations for inverse problems. The theory of polytope angles 
dates to the work of Schlafli in the 1850s [Sch50b]. In pioneering research, Vershik & Sporyshev [VS86] 
applied these ideas to analyze random convex optimization problems. They were able to estimate the average 
number of steps that the simplex algorithm requires to solve a linear program with random constraints as 
the number of decision variables tends to infinity. This research inspired further theoretical work on the 
neighborliness of random polytopes [VS92, AS92, BH99] . More recently, Donoho [Don06b] and Donoho & 
Tanner [DT05, DT09a, DTlOa, DTlOb] have used similar ideas to study specific regularized linear inverse 
problems with random data. The papers [XH11, KXAH11] contain some additional work in this direction. Let 
us offer a short, qualitative summary of this research. 

Donoho [Don06b] analyzed the performance of the convex program (1.1) for solving the compressed 
sensing problem described in Section 1. In the asymptotic regime where the number s of nonzeros is 
proportional to the ambient dimension d, he obtained a lower bound m > y/{s) on the number m of Gaussian 
measurements required for the optimization to succeed (the weak transition). Numerical experiments [DT09b] 
suggest that this bound is sharp, but the theoretical analysis in [Don06b] falls short of establishing that a 
phase transition actually exists and identifying its location rigorously. Finite-dimensional results with a similar 
flavor appear in [DTlOb]. 

Donoho [Don06b] also established an asymptotic lower bound on the number m of random measurements 
required to recover all s-sparse vectors in M d with high probability (the strong threshold) . Using different 
methods, Stojnic [Sto09] has improved this bound for some values of the sparsity 5. These bounds are not 
subject to numerical interrogation, so we do not have reliable evidence about what actually happens. Indeed, 
it remains an open question to prove that a strong phase transition exists and to identify its exact location in 
the regime where the sparsity is proportional to the ambient dimension. 

Donoho & Tanner [DT09a] have also made a careful study of the behavior of the convex program (1.1) 
in the asymptotic regime where the sparsity s « d. In this case, they succeeded in proving that weak and 
strong thresholds exist, and they obtained exact formulas for the thresholds. More precisely, at the computed 
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thresholds, they show that the probability of success jumps from one to 1 - e, where e is positive. Although 
these results do not ensure that certain failure awaits on the other side of the threshold curve, they do 
establish that the behavior changes. 

Donoho & Tanner [DT05, DT09a] provide similar results for the problem of recovering a sparse nonnegative 
vector by solving the £\ minimization problem (1.1) with an additional nonnegativity constraint. Once again, 
they obtain lower bounds on the number of Gaussian measurements required for weak and strong success. 
These bounds are sharp in the regime where the ultrasparse regime s = o(logd). The earlier work of Vershik & 
Sporyshev [VS86] contains results equivalent with the weak transition estimate from [DT09a] . 

Other authors have used polytope angle calculations to develop theory for related £\ minimization problems. 
For example, Khajehnejad et al. [KXAH11] provide an analysis of the performance of weighted £\ regularizers. 
Xu & Hassibi [XH11] obtain lower bounds for the number of measurements required for stable recovery of a 
sparse vector via £\ minimization. 

Finally, let us mention that Donoho & Tanner [DTlOa] have obtained a more complete theory about the 
location of the phase transition for a regularized linear inverse problem where the norm is used as a 
regularizer. Their results, formulated in terms of projections of the hypercube and orthant, are a consequence 
of a geometric theorem that goes back to Schlafli [Sch50a]; see the discussion [SW08, p. 299]. 

10.1.1. Commentary. The analysis of structured inverse problems by means of polytope angle computations 
has led to some striking conclusions, but this approach has inherent limitations. First, the method is restricted 
to polyhedral cones, which means that it is silent about the behavior of many important regularizers, including 
the Schatten 1-norm. Second, it requires detailed bounds on all angles of a given polytope (equivalently all 
the intrinsic volumes of the normal cones of the polytope), which means that it is difficult to extend beyond a 
few highly symmetric examples. For this reason, most of the existing results are asymptotic in nature. Third, 
because of the intricacy of the calculations, this research has produced few definitive results of the form "the 
probability of success jumps from one to zero at a specified location." 

We believe that our analysis supersedes most of the research on weak phase transitions for £\ regularized 
linear inverse problems that is based on polytope angles. We have shown for the first time that there is a 
transition from absolute success to absolute failure, and we have characterized the location of the threshold 
when the sparsity 5 > V~d + 1. On the other hand, we have not verified that our upper bound for the statistical 
dimension of the descent cone of the £\ norm at a sparse vector is sharp when s < \fd. Currently, the 
paper [DT09a] contains the only authoritative results in the ultrasparse regime s= o(logd). 

It is not hard to extend our analysis to the other settings discussed in this section. Indeed, we can easily 
study regularized inverse problems involving weighted £ \ norms and £\ norms with nonnegativity constraints. 
We can effortlessly rederive phase transitions for regularized problems. Bounds for strong transitions are 
also accessible to our methods. We have omitted all of this material for brevity. 

10.2. Conic intrinsic volumes. In modern geometry, work on polytope angles has largely been supplanted by 
research on spherical and conical integral geometry [SW08]. Several authors have independently recognized 
the power of this approach for analyzing random instances of convex optimization problems. 

Amelunxen [Amell] and Amelunxen & Biirgisser [AB11, AB12] have shown that conic geometry offers 
an elegant way to perform average-case and smoothed analysis of conic optimization problems. Their work 
requires detailed computations of conic intrinsic volumes, which can make it challenging to apply to particular 
cases. We can simplify some of their techniques using the new fact, Theorem 6.1, that intrinsic volumes 
concentrate at the statistical dimension. Theorem 8.1 is based on their research. 

McCoy & Tropp [MT12] have used conic geometry to study the behavior of regularized linear inverse 
problems with random measurements and regularized demixing problems under a random model. This 
approach leads to both upper and lower bounds for weak and strong phase transitions in a variety of problems. 
As with Amelunxen's work [Amell], this research depends on detailed computations of conic intrinsic 
volumes. As a consequence, it was not possible to rigorously locate the phase transition, nor was there any 
general theory to inform us that phase transitions must exist in general. Combining the ideas from [MT12] 
with Theorem 7.1, we are able to reach more definitive conclusions. 

10.3. Gordon's comparison and Gaussian widths. The work we have discussed so far depends on various 
flavors of integral geometry. There is a completely different technique for analyzing linear inverse problems 
with random data that depends on a comparison principle for Gaussian processes, due to Gordon [Gor85]. 
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Gordon [Gor88] explains how to use this comparison to find accurate bounds on the probability that a 
randomly oriented subspace strikes a subset of the sphere. In particular, his ideas can be used to bound the 
probability that the null space of a random matrix intersects a descent cone, which results in a version of 
Theorem II. 

Rudelson & Vershynin [RV08] were the first authors to observe that Gordon's work is relevant to the 
analysis of the compressed sensing problem. Stojnic [Sto09] refined the method enough that he was able to 
establish empirically sharp lower bounds for the number m of measurements required for the compressed 
sensing problem. Oymak & Hassibi [OHIO] have also applied these techniques to develop empirically sharp 
lower bounds on the number of random measurements needed to identify a low-rank matrix using the 
Schatten 1-norm. Later, Chandrasekaran et al. [CRPW12] showed that a similar approach leads to empirically 
sharp lower bounds on the number of random measurements required to solve other types of regularized 
inverse problems. 

All of this work depends on a summary parameter for convex cones called the Gaussian width. For a closed 
convex cone C e S^, the width is defined as 

w{C) := E sup^^gri-i (g, x). (10.1) 

The Gaussian width has a tight connection with the statistical dimension. 

Proposition 10.1 (Statistical dimension & Gaussian width). Let C be a convex cone. Then 

w 2 {C)<8{C)<w 2 {C) + l. (10.2) 

The lower bound in (10.2) is an easy consequence of duality, and the upper bound depends on some 
concentration arguments. See Appendix E for a short proof. 

As a consequence of Proposition 10.1, we can import ideas from the literature on Gaussian widths to obtain 
accurate computations of the statistical dimension. Conversely, our calculations of the statistical dimension 
lead to accurate bounds for Gaussian widths. In particular, Theorem 4.5 provides the first proof that previous 
calculations of Gaussian widths are essentially sharp. 

We have a strong preference for the statistical dimension over the Gaussian width. Indeed, the statistical 
dimension canonically extends the linear dimension to the class of convex cones (Proposition 5.12). The 
statistical dimension also summarizes the sequence of conic intrinsic volumes (Proposition 5.11). Since 
intrinsic volumes drive the kinematic formula (5.9) and its generalization (5.11), this connection has many 
consequences for conic integral geometry. 

10.4. Minimax Denoising. Several authors [DMM09a, DJM11, DGM13] have remarked on the power of 
statistical decision theory to empirically predict the location of the phase transition in a regularized linear 
inverse problem with random data. For the compressed sensing problem, two recent papers [BM12, BLM12] 
provide a rigorous explanation for this coincidence. But there is no general theory that illuminates the 
connection between these two settings. Our work, together with a recent paper of Oymak & Hassibi [OH12], 
resolves this issue. In short, Oymak & Hassibi show that the minimax risk for denoising is essentially the same 
as the statistical dimension, while our research proves that a phase transition must occur at the statistical 
dimension. Let us elaborate. 

A classical problem in statistics is to estimate a target vector x given an observation of the form zq = xq + ag 
where g is a standard normal vector and a is an unknown variance parameter. When the unknown vector xo 
has specified properties (e.g., sparsity), we can often construct a convex regularizer / that promotes this type 
of structure [CRPW12]. A natural estimation procedure is to solve the convex optimization problem 

x y := argmin jf(x) + \ ||z ~*ll 2 - (10.3) 

The regularization parameter j > negotiates a tradeoff between the structural penalty and the data fidelity 
term. One way to assess the performance of the estimator (10.3) is the minimax MSE risk, 3 defined as 

Rmmixo) := sup inf -lrE[||x r -xoll 2 ]. 

a>o r>o o~ 



The usual definition of the minimax risk involves an additional supremum over a class of distributions on the target xq . In many 
applications, the symmetries in the regularizer / allow a straightforward reduction to the case of a fixed target xo . 
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In other words, the risk identifies the relative mean-square error for the best choice of tuning parameter y at 
the worst choice of the noise variance a 2 . 

The papers [DMM09a, DJM1 1, DGM13] examine several regularizers / where the minimax risk empirically 
predicts the performance of the linear inverse problem (2.5) with a Gaussian measurement matrix A. The 
authors of this research propound a conjecture that may be expressed as follows. 

Conjecture 10.2 (Minimax risk predicts phase transitions). Suppose that A e u mxd is a matrix with independent 
standard normal entries, and let f :R d -^Rbe a convex function. Then 

m > -Rmm(*o) + o{d) ==> (2.5) succeeds with probability 1 - o(l); 

m < -Rmm(*o) + o{d) ==> (2.5) succeeds with probability o(l). 

The order notation here should be interpreted heuristically. 

To the best of our knowledge, this claim has been established rigorously only for the £\ norm [BLM12, Thm. 8] 
in the asymptotic setting. The paper [BLM12] also includes analysis for a wider class of matrices. 

Together, our paper and the recent paper [OH12] settle Conjecture 10.2 in the nonasymptotic setting for 
many regularizers of interest. Indeed, Oymak & Hassibi [OH 12] prove that 

|iW*o)-<5(@(/> ))| = 0{y/d). 

Their result holds under mild conditions on the regularizer / that suffice to address most of the phase 
transitions conjectured in the literature. Our result, Theorem II, demonstrates that the phase transition in the 
linear inverse problem (2.5) with a standard normal matrix A e u mxd occurs when 

|m-<5(@(/> ))| = 0{\fd). 

Combining these two results, we conclude that, in some generality, the minimax risk coincides with the 
location of the phase transition in a regularized linear inverse problem with random measurements. 

Appendix A. Computer experiments 

We confirm the predictions of our theoretical analysis by performing computer experiments. This ap- 
pendix contains some of the details of our numerical work. All experiments were performed using the CVX 
package [GB13] for Matlab with the default settings in place. 

A.l. Linear inverse problems with random measurements. This section describes the two experiments 
from Section 2.3 that illustrate the empirical phase transition in compressed sensing via £\ minimization and 
in low- rank matrix recovery via Schatten 1-norm minimization. 

In the compressed sensing example, we fix the ambient dimension d = 100. For each m = 1,2,3,..., d- 1 and 
each s = 1,2,3,... ,d- 1, we repeat the following procedure 50 times: 

(1) Construct a vector xq e U d with s nonzero entries. The locations of the nonzero entries are selected at 
random; each nonzero equals + 1 with equal probability. 

(2) Draw a standard normal matrix A e U' nxd , and form zq = Ax . 

(3) Solve (2.6) to obtain an optimal point x. 

(4) Declare success if ||^-x || < 10" 5 . 

All random variables are drawn independently in each step and at each iteration. Figures 1.1 and 2.2[left] 
show the empirical probability of success for this procedure. 

We take a similar approach in the low-rank matrix recovery problem. Fix n = 30, and consider square n x n 
matrices. For each rank r - 1,2,..., n and each m = 1,29,58,87, ...,n 2 , we repeat the following procedure 50 
times: 

(1) If r > \\/~m\ + 1, declare failure because the number of degrees of freedom in an n x n rank-r matrix 
exceeds the number m of measurements. 

(2) Draw a rank-r matrix Xq = QiQj, where Qi and Q2 are independent n x r matrices with orthonormal 
columns, drawn uniformly from an appropriate Stiefel manifold [Mez07]. 

(3) Draw a standard normal matrix Ae u mxn , and define £/{X) := A-vec(X), where the vectorization 
operator stacks the columns of a matrix. Form the vector of measurements zq = (Xo). 

(4) Solve (2.7) to obtain an optimal point X. 

(5) Declare success if ||X-X |] F < 10" 5 . 
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As before, all random variables are chosen independently. Readers interested in reproducing this experiment 
should be aware that this procedure required nearly one month to execute on a desktop workstation. 
Figure 2.2 [right] displays the results of this experiment. 

A.2. Statistical dimension curves. The formulas (4.11) and (4.14) for the statistical dimension of the 
descent cones of the £\ norm and the Schatten 1-norm do not have a closed form representation. Nevertheless, 
we can evaluate these expressions using simple numerical methods. Indeed, in each case, we solve the 
stationary equation (4.13) and (4.16) using the rootfinding procedure f zero, which works well because the 
left-hand side of each equation is a monotone function of t. To evaluate the integral in (4.11), we use the 
command erf c. To evaluate the integral in (4.14), we use the quadrature function quadgk. 

We have encountered some numerical stability problems evaluating (4.11) when the proportional sparsity 
p = sld is close to zero or one. Similarly, there are sometimes difficulties with (4.14) when the proportional 
rank p = rim or the aspect ratio v-mln are close to zero or one. Nevertheless, relatively simple code based 
on this approach is usually reliable. 

A.3. Logistic regression. Several of the experiments involve fitting the logistic function 

to the data, where j0o,/6i e U are parameters. We use the command glmf it to accomplish this task. The center 
p:= -fio/Pi of the logistic function is the point such that £(fi) = I. 



Appendix B. Theoretical results on descent cones 

This appendix contains the theoretical analysis that permits us to calculate the statistical dimension of 
descent cones. In particular, we complete the proof of Proposition 4.4, and we establish Theorem 4.5. The 
material in this appendix has some overlap with independent work due to Oymak & Hassibi [OH 12]. 

B.l. The expected distance to the subdifferential. To complete the proof of Proposition 4.4, we must show 
that the function F : r >— ■ E [dist 2 (g,r -df(x))] exhibits a number of analytic and geometric properties. The 
hypotheses of the proposition ensure that df(x) is a nonempty, compact, convex set that does not contain the 
origin. For clarity, we establish an abstract result that only depends on the distinguished properties of the 
subdifferential. Let us begin with a lemma about a related, but simpler, function. 

Lemma B.l (Distance to a dilated set). Let Sbea nonempty, compact, convex subset ofR d that does not contain 
the origin. In particular, there are numbers that satisfy b< \\s\\ < Bfor all seS. Fix a point u e and define the 
function 

F u :t~ dist 2 (M,rS) for t > 0. (B.l) 

The following properties hold. 

(1) The function F u is convex. 

(2) The function satisfies the lower bound 

F„M> (Tk-||«||) 2 for all t > \\u\\ I b. (B.2) 

In particular, F u attains its minimum value in the interval [0,2t lb]. 

(3) The function F u is continuously differ entiable, and the derivative takes the form 

2 

F u (r) = — (u-7t TS (u), 7i rS {u)) fon>Q. (B.3) 

The right derivative F' u {0) exists, and F' U {Q) = lim T j -F^(T). 

(4) The derivative admits the bound 

|F^(t)|<2B(||u||+tB) for all t>0. (B.4) 

(5) Furthermore, the map w—> F' u {t) is Lipschitzfor each t > 0: 

\F' u {T)-F' y {T)\<2B\\u-y\\ for all u,y eR d . (B.5) 
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Proof. These claims all take some work. Along the way, we also need to establish some auxiliary results to 
justify the main points. 

Convexity. For t > 0, convexity follows from the representation 

FuW - [ inf II u — ts|| J = [t- inf || m/t- s|| j = [T-dist(M/r,S)] 2 . (B.6) 

Byway of justification, the distance to a closed convex set is a convex function [Roc70, p. 34], the perspective 
transformation [HUL93a, Sec. IV.2.2] of a convex function is convex, and the square of a nonnegative convex 
function is convex by a direct calculation. 

Continuity. The representation (B.6) shows that the function F u is continuous for t > because the distance 
to a convex set is a Lipschitz function, as stated in (3.5). To obtain continuity at t = 0, simply note that 

Wule) ~ F U (P)\ = |ll u- 7i eS [u)\\ 2 - || it|| 2 | < 2|<«, ;t £S (m)>| + ||7r £ s(M)ll 2 £ 2 ||b|| (eB) + {eB) 2 - as e - 0. 

Indeed, each point in eS is bounded in norm by eB, so the projection 7i e s(u) admits the same bound. Continuity 
implies that F u is convex on the entire range t > 0. 

Attainment of minimum. Assume that r > ||b|| lb. Then 

dist(B,rS) = inf ||ts- m|| > inf [t||s|| - ||m|| 1 > ib- \\u\\ > 0. 

seS seS 1 ' 

Square this relation to reach (B.2). It follows that F u (j) > F M (0) = ||m|| 2 for all t > 2||b|| lb. Therefore, any 
minimizer of F u must occur in the compact interval [0,2||b|| lb]. Since F u is continuous, it attains its minimal 
value in this range. 

Differentiability. We obtain the derivative from a direct calculation: 

F' u (t) = — - [t 2 dist 2 (M/r, S)] = 2t • dist 2 (w/T, S) + t 2 (2 ((m/t) - ti s (u/t)), -m/t 2 ) 

2 2 
= -[dist 2 (M,TS) - (u-7i TS (u), u)]---{u-ji tS {u), tt t s(m)> 

T T 

The first relation follows from (B.6). The second relies on the formula (3.6) for the derivative of the squared 
distance. To obtain the fourth relation, we express the squared distance as \[U-7i x s{u)\\ 2 . 

Right derivative at zero. The right derivative F' u (0) exists, and the limit formula holds because F u is a proper 
convex function that is continuous on [0,oo] and differentiable on (0,oo); see [Roc70, Thm. 24.1]. 

Continuity of the derivative. The expression (B.3) already implies that F' u is continuous for t > because the 
projection onto a convex set is continuous [RW98, Thm. 2.26]. Continuity of the derivative at zero follows 
from the limit formula for the right derivative at zero. 

Bound for the derivative. Given the formula (B.3), it is easy to control the derivative when t > 0: 

\F' U (t) | < -|| m - ji tS (u) II || k tS (m) || < - (|| m|| + tB) [t B) = 2B (|| b|| + tB). 

We obtain the estimate for t = by taking the limit. 

Lipschitz property. We obtain the Lipschitz bound (B.5) from (B.3) after some effort. Fix t > 0. The 
optimality condition [HUL93a, Thm. III. 3. 1.1] for a projection onto a closed convex set implies that 

(y-^Ts(y), ^js(y))^(y-^Tsiy), ^rs(M)) for all u,yeR d . 

As a consequence, 

(b-^- tS (b), 7i- [S {u))-(y-7i jS {y), 7T rS iy)) < ({u- 7t rS iu)) - [y- 7t jS {y)), n TS (u)) 

< || (I - n jS ) (u) — (I — n rS ) (y) || || 7t rS {u) \\ < \\ u - y || ■ (T5) . 

The last relation relies on the fact (3.4) that the map I-tt t s is nonexpansive. Reversing the roles of u and y 
in the last calculation, we see that 

|<b- jt tS (m), 7t rS (u)) - (y-7t rS {y), n jS {y))\ < (jB) ■ \\u-y\\ . 

Combining this estimate with the expression (B.3) for the derivative, we reach 

|f;(t)-F;(t)]<2B-||b-j/||. 

For t = 0, the result follows when we take the limit as t 1 0. □ 
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With this result at hand, we are prepared to prove a lemma that confirms the remaining claims from 
Proposition 4.4. 

Lemma B.2 (Expected distance to a dilated set). Let She a nonempty, compact, convex subset ofU d that does 
not contain the origin. Then the function 

F(T):=E[dist 2 (g,TS)] =E[F g (r)] for t>0 

is strictly convex, continuous at r = 0, and differentiable for t > 0. It attains its minimum at a unique point. 
Furthermore, 

F'(t) =E[F' g (r)] for all t>0. (B.7) 
For t-0, we interpret F'{t) as a right derivative. 

Proof. These properties will follow from Lemma B.l, and we continue using the notation from this result. 

Convexity. The function F is convex for t > because it is an average of functions of the form F g , each of 
which is convex. 

Strict convexity. We argue by contradiction. Since F is convex, if F were not strictly convex, its graph would 
contain a linear segment. More precisely, there would be numbers < p < t and n e (0, 1) for which 

E [F g {{r)p + (1 - t])t)S)] = E [17. F g {p) + (1 -17) ■ F g (r)] . (B.8) 

The convexity of F g ensures that, for each g, the bracket on the right-hand side is no smaller than the bracket 
on the left-hand side. Therefore, the relation (B.8) holds if and only if the two brackets are equal almost 
surely with respect to the Gaussian measure. But note that 

Fofap + (1 - t])t) = dist 2 (0, (r)p + (1 - n)T)S) = {np+{\- n)j) 2 • inf || sf 

seS 

< [np 2 + (1 - n)T 2 ) • inf || sf = n • dist 2 (0, pS) + (1 - 77) • dist 2 (0, tS) = 77 • F (p) + (1 - rj) • F (t). 

The strict inequality depends on the strict convexity of the square, together with the fact that the infimum is 
strictly positive. On account of (3.5), the squared distance to a convex set is a continuous function, so there is 
an open ball around the origin where the same relation holds. That is, for some e > 0, 

Fu(rip+{l-ri)T)<T)-F u {p) + (l-vi)-F u (T) when ||w||<£. 

This statement contravenes (B.8). 

Continuity at zero. Imitating the continuity argument in Lemma B.l, we find that 

Fie) - F(0) = E [F g {e) - F g {0)] < E [2 ||g|| ||jr eS (g) || + l|7r £S (g)l| 2 ] < 2^d ■ [eB) + {eB) 2 - as e - 0. 

This is all that is required. 

Differentiability. This point follows from a routine application of the Dominated Convergence Theorem. 
Indeed, for every t > 0, the function F(t) = E [F g {T)\ takes a finite value, and Lemma B.l establishes that F' g is 
continuously differentiable. For each compact interval the bound (B.4) ensures that 



Esup |Fg(T)| <Esup [2£ (||g|| + t£)] <2B\/d + 2B 2 



SUp T 2 

re/ 



< 00. 



The convergence theorem now implies that F'{t) = ^ E [F g {T)\ = E [F' g {T)] for all t > 0. 

Attainment of minimum. The median of the random variable ||g|| does not exceed \fd. Therefore, when 
Tb > \fd, we have 

F(T) > E[F g (T) I ||g|| < Vd] -P{ ||g|| < Vd} > i E [(Tb- ||g||) 2 I ||g|| < Vd] > ^(Tb-Vd) 2 . 

The first inequality follows from the law of total expectation, and the second depends on (B.2). In particular, 
F(t) > F(0) = d when t > 2b~ l \fd. Thus, any minimizer of F must occur in the compact interval \0,2b~ l \fd\. 
Since F is strictly convex and continuous, it attains its minimum at a unique point. □ 
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B.2. Error bound for descent cone calculations. In this section, we prove Theorem 4.5, which provides an 
error bound for Proposition 4.4. We require a standard result concerning the variance of a Lipschitz function 
of a standard normal vector. 

Fact B.3 (Variance of a Lipschitz function). Let H:U d — R be a function that is Lipschitz with respect to the 
Euclidean norm: 

\H{u)-H{y)\<M-\\u-y\\ for all u,yeU d . 

Then 

Var(#(g))<M 2 (B.9) 

where g is a standard normal vector. 

Fact B.3 is a consequence of the Gaussian Poincare inequality; see [Bog98, Thm. 1.6.4] or [LedOl, p. 49]. 

Proof of Theorem 4.5. Let / : R d -* R be a norm, and fix a nonzero point x e U d . According to [HUL93a, 
Ex. VI.3.1], the subdifferential of the norm satisfies 

S = df{x) = {seU d :(s, x) = f{x) and /°(s) = l}, (B.10) 

where f° is the norm dual to /. Thus, S is nonempty, compact, convex, and it does not contain the origin. 
As in Lemmas B.l and B.2, we introduce the functions 

F„ :t~ dist 2 (w,rS) and F : t ~ E [F g (r)], 

where g is a standard normal vector. Proposition 4.4 provides the upper bound Einf T >o F g (r) < inf T >o F(t). 
Our objective is to develop a reverse inequality. 

We establish the result by linearizing each function F g around a suitable point. Lemma B.2 shows that the 
function F attains its minimum at a unique location, so we may define 

t* := argmin F(t). 

T>0 

Similarly for each u e U d , Lemma B.l shows that F u attains its minimum at some point t„ > 0. For the 
moment, it does not matter how we select this minimizer. Since F u is convex and differentiable, we can bound 
its minimum value below using the tangent at t*. That is, 

inf F„(t) = F„(t„) > F„(t*) + (t„ - t*) -F^(t*). 

T>0 

Should t* = 0, we interpret F^(t + ) as a right derivative. Replacing u by the random vector g and taking the 
expectation, we reach 

E[inf F g (T)] >E[F g (T*)]+E[(T g -T*)-F g (T*)] 

= F(T*) + E [(r g - E[r g ]) • (F g (T*) - E [F g (rj])] + E[T g - t*] • E [F g (T*)] 

> inf F(t) - [ Var(T g ) • Var(F g (T*))] 1/2 + E [ Tg - t*] -F'Ct*). (B.ll) 

The second inequality depends on Cauchy-Schwarz, and we have invoked (B.7) to identify the derivative of 
F. From here, we obtain the conclusion (4.10) as soon as we estimate the two error terms. The advantage of 
this formulation is that the Lipschitz properties of the random variables allow us to control their variances. 

First, let us demonstrate that the last term on the right-hand side of (B.ll) is nonnegative. Abbreviate 
e\ :- E [Tg - t+] • F'(t*). There are two possibilities to consider. When t* > 0, the derivative F'(t*) = because 
t* minimizes F. Thus, e\ = 0. On the other hand, when t* = 0, it must be the case that the right derivative 
F'(t*) > 0, or else the minimum of the convex function F would occur at a strictly positive value. Since x g > 0, 
we see that the quantity e\ > 0. 

To compute the variance of x g , we need to devise a consistent method for selecting a minimizer t m of F„. 
Introduce the closed convex cone K:= cone(S), and notice that 

inf F„(t) = inf dist 2 (M,rS) = dist 2 (w,iC). 

T>0 T>0 

In other words, the minimum distance to one of the sets tS is attained at the point (w). As such, it is 
natural to pick a minimizer t u of F u according to the rule 

t„ := mf{r > : Yl K [u) e rS\ = — . (B.12) 

fix) 
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The latter identity follows from the expression (B.10) for the subdifferential S. In light of (B.12), 

Vu -tj-I = ttt |<n*(») - n x (y), x>| < -Jp- ■ lin^( M ) - 11*0011 < • ll«- y\\ . 

J \,X.) J [X) J [X) 

We have used the fact (3.4) that the projection onto a closed convex set is nonexpansive. Fact B.3 delivers 

[Var(T g )] 1/2 £ M (B.13) 

Finally, let us turn to the remaining variance term in (B.ll). We have already computed the Lipschitz 
bound we need for the analysis. Indeed, the inequality (B.5) states that 

|Pi(T*)-i^(T*)|s[2sup l|s||]-|l«-yll. 

Another invocation of Fact B.3 delivers the estimate 

[Var(^(T*))] 1/2 <2sup |]s||. (B.14) 

To complete the proof, we combine the inequalities (B.ll), (B.13), (B.14), and the fact that e\ > 0. This is 
the advertised result (4.10). □ 

Appendix C. Statistical dimension calculations 

This appendix contains the details of the calculations of the statistical dimension for several families of 
convex cones: circular cones, 6\ descent cones, and Schatten 1-norm descent cones. 

C.l. Circular cones. First, we approximate the statistical dimension of a circular cone. 

Proof of Proposition 4.3. We begin with an exact integral expression for the statistical dimension of the circular 
cone C = Circrf(a). The spherical formulation (4.2) of the statistical dimension asks us to average the squared 
norm of the projection of a random unit vector 6 onto the cone. Introduce the angle ft := := arccos(0i) 
between and the first standard basis vector (1,0,..., 0). Elementary trigonometry shows that the squared 
norm of the projection of 6 onto the cone C admits the expression 

[1, 0</3<a, 
F(P) := ||n c (0)l| 2 = < cos 2 (/3- a), a</3<^ + a, 
(o, § + a < j6 < jr. 

To obtain the exact statistical dimension 8(Q from (4.2), we integrate F(ip) in polar coordinates in the usual 
way (cf. [SW08, Lem. 6.5.1]): 



8{C) = d- 



v/ir(|(d-l)) Jo 



I* sm d - 2 {p)F(p)&p. (C.l) 
Jo 



/ 

Jo 



We can approximate the integral by a routine application of Laplace's method [AF03, Lem. 6.2.3]: 

sin rf - 2 ^) F(p) dp = ) + 0{d- 312 ). 

To simplify the ratio of gamma functions, recall Gautschi's inequality [OLBC10, Sec. 5.6.4]: 

r(i(rf-D) 

Combine the last three displays to reach the expression (4.7). 

To obtain the more refined estimate cos(2a) for the error term, one may use the fact that the intrinsic 
volumes of a circular cone satisfy 

( k (d-2)\ 

2 sin*~ 1 (a)cos d- *~ 1 (a) for k= l,...,d- 1. (C.2) 

This formula is drawn from [Amell, Ex. 4.4.8]. We are using the analytic extension to define the binomial 
coefficient. The easiest way to study this sequence is to observe the close connection with the density of a 



^fc(Circ rf (a))= ^ 
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binomial random variable and to apply the interlacing result, Proposition 5.6. In the interest of brevity, we 
omit the details. □ 

C.2. Descent cones of the £\ norm. In this section, we show how to use Recipe 4.1 and Theorem 4.5 to 
compute the statistical dimension of the descent cone of the £\ norm at a sparse vector. This is a warmup for 
the more difficult, but entirely similar, calculation in the next section. 

Proof of Proposition 4.7. Since the £\ norm is invariant under signed permutations, we may assume that the 
sparse vector x e U d takes the form x = (xi,...,x s ,0,...,0), where X{ > 0. To compute 5(®(IHIi,x)), we use the 
subdifferential bound (4.8) for the statistical dimension of a descent cone: 

5(®(||-|li,x))<inf Etdist^g.T-ailxlli)]. (C.3) 

T>0 

Observe that the subdifferential of the £\ norm at x has the following structure: 

I Ui = 1, i = 1....,5 ,„ 
HEdllxlh «■ \ l ' . (C.4) 
[|Wil<l, i = s+l,...,d. 

We can compute the distance from a standard normal vector g to the dilated subdifferential as follows. 

dist 2 (g,T-d||x|| 1 ) = £(g i --T) 2 + £ P0S 2 (|gi|-T), 
i=l i=s+l 

where Pos(a) := avO and the operator v returns the maximum of two numbers. Indeed, we always suffer an 
error in the first 5 components, and we can always reduce the magnitude of the other components by the 
amount t. Taking the expectation, we reach 

-/ (u-x) 2 e~" 12 du. (C.5) 

71 Jr 

Simplify the integral, and introduce the resulting expression into (C.3). We reach 

{/~2~ r°° 2 2 1 1 

s(l + T 2 ) + y- [d-s) (1 + t 2 )J e~" l2 du-ie~ T 12 k (C.6) 

This expression coincides with the upper bound in (4.12), which has been normalized by the ambient 
dimension d. 

Now, we need to invoke the error estimate, Theorem 4.5. An inspection of (C.4) shows that the subdif- 
ferential d||x||i depends on the number s of nonzero entries in x but not on their magnitudes. It follows 
from (3.3) that, up to isometry the descent cone @(||-|li ,x) only depends on the sparsity. Therefore, we may 
as well assume that x= (1,...,1,0,...,0). For this vector, ||x/ ||x||||i = y/s. Second, the expression (C.4) for the 
subdifferential shows that || u\\ < \fd for every subgradient ued ||x||. Therefore, the error in the inequality (C.6) 
is at most 2\/d/s. We reach the lower bound in (4.12). 

Finally, Lemma B.2 shows that the brace in (C.6) is a strictly convex, differ entiable function of t with a 
unique minimizer. It can be verified that the minimum does not occur at t = 0. Therefore, we determine the 
stationary equation (4.13) by setting the derivative of the brace to zero and simplifying. □ 

C.3. Descent cones of the Schatten 1-norm. Now, we present the calculation of the statistical dimension 
of the descent cone of the Schatten 1-norm at a low-rank matrix. The approach is entirely similar with the 
argument in Appendix (C.2). 

Proof of Proposition 4.8. Our aim is to identify the statistical dimension of the descent cone of the Schatten 
1-norm at a fixed low-rank matrix. The argument here parallels the proof of Proposition 4.7, but we use 
classical results from random matrix theory to obtain the final expression. Our asymptotic theory demonstrates 
that this simplification still results in a sharp estimate. 

We begin with the fixed-dimension setting. Consider an m x n real matrix X with rank r. Without loss of 
generality, we assume that m < n and < r < m. The Schatten 1-norm is unitarily invariant, so we can also 
assume that X takes the form 



Z 




where Z = diag(<7i,<72,...,o» an d CT;>0for / = !,..., r. 
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The subdifferential bound (4.8) for the statistical dimension of a descent cone states that 

5(@(||-|| Sl ,X))<inf E[dist 2 (G,T-d|]X|| Sl )], (C.7) 

where we compute distance with respect to the Frobenius norm. The m x n matrix G has independent standard 
normal entries, and it is partitioned conformally with X: 

G G 

G- 11 _ 12 where Gn is r x r and G22 is (m- r) x {n- r). 
G21 G22J 

According to [Wat92, Ex. 2], the subdifferential of the Schatten 1-norm at X takes the form 



I r 
w 



(C.8) 



where cf\[W) denotes the maximum singular value of W. It follows that 

2 



dist(G,T-diixii Sl r 

Using the Hoffman-Wielandt Theorem [HJ90, Cor. 7.3.8], we can derive 



Gh-tI,. 
G21 



G12 




+ inf ||G 22 -tW|| 2 . 

p (7l(W)<l 



inf \\G22-rW\\l= inf £ [0^22) -ra t {W)) = £ Pos 2 (a i -(G 22 ) -t), 



< 2 - ■-<• £ (a ; (G„) r ( 7,-!lV;) : '- £ 

i=i i=i 

where o-, (-) is the zth largest singular value. Combining the last two displays and taking the expectation, 

[ m-r 

+2f/~T T ^ II VII M w( w* I « I T 2 



E[dist 2 (G,T-d||X|| Sl )] = r[m+n-r + T 2 ) + E 
Introduce this expression into (C.7): 



£ Pos 2 (ctKG 2 2)-t) 



= 1 



(5(@(|]-|] Sl ,X))< inf r{m+n-r + T 2 ) + E 



£ P0S 2 (CT ! -(G 2 2)-T) 



! = 1 



(C.9) 



(C.IO) 



We reach a nonasymptotic bound on the statistical dimension. 

Next, we apply the error bound, Theorem 4.5. The expression (C.8) shows that the subdifferential dHXHsj 
depends only on the rank of the matrix X and its dimension, so the descent cone @(||-|lsi >X) has the same 
invariance. Therefore, we may consider the m x n rank-r matrix X-l r m 0, which verifies \\X/ \\X\\ \\s l - \fr. 
Each subgradient Y ed \\X\\s 1 satisfies the norm bound || F|| F < \fm. We conclude that the error in (C.IO) is no 
worse than Isjmlr. 

It is challenging to evaluate the formula (C.IO) exactly. In principle, we could accomplish this task using 
the joint singular value density [And84, p. 534] of the Gaussian matrix G22- Instead, we set up a framework 
in which we can use classical random matrix theory to obtain a sharp asymptotic result. 

Consider an infinite sequence {X(r, m, n)} of matrices, where X{r, m, n) has rank r and dimension m x n with 
m<n. For simplicity, we assume that the problem parameters r, m, n —■ 00 with constant ratios r/m=:pe(0,l) 
and mln-:ve (0,1]. The general case follows from a continuity argument. After a change of variables 
t >-* T\/n- r and a rescaling, the expression (C.9) leads to 

1 



E [ dist 2 {G,TVn-r-d \\X{r, m, n) || Sl )] 



: PV+ p(l - pv)(l + T 2 ) + (1 - p)(l - pv) • E 



£ P0S 2 (CT,(Z)-T) 



[=1 



(C.ll) 



Here, G is an m x n standard normal matrix. The matrix Z has dimension (m- r) x {n - r), and its entries are 
independent normal(0, [n- r) -1 ) random variables. 

Observe that the expectation in (C.ll) can be viewed as a spectral function of a Gaussian matrix. We can 
obtain the limiting value of this expectation from a variant of the Marcenko-Pastur Law [MP67] . 

Fact C.l (Spectral functions of a Gaussian matrix). Fix a continuous function F : U + —> R. Suppose p, q —■ 00 
and pi q — y e (0, 1]. Let Z pq be a p x q matrix with independent normal(0, q~ l ) entries. Then 



p j=i 



■ I F{u) ■ q)y{u) du. 

J a- 
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The limits a+:=l + s /y. The kernel <p y is a probability density supported on [a-,a+]: 



w v [u):= \l {u 2 - a 2 ){a 2 - u 2 ) for ue[a-,a+]. 

nyu v 

Fact C.l is usually stated differently in the literature. The result here follows from the almost sure weak 
convergence of the empirical spectral density of a sample covariance matrix to the Marcenko-Pastur den- 
sity [BS10, Thm. 3.6] and the almost sure convergence of the extreme eigenvalues of a sample covariance 
matrix [BS10, Thm. 5.8], followed by a change of variables in the integral. We omit the uninteresting details 
of this reduction. 

Let us apply Fact C.l to our problem. The limiting aspect ratio y of the matrix Z satisfies 

m-r v(l-p) 

y= = - • 

n—r 1 - pv 

As r, m, n — > oo, we obtain the limit 



E 



j m-r 

V Pos 2 (a,CZ)-T) 

m-r £i 



f a+ ? 

I Pos (u — t) -<py{u)Au. 

Ja- 



Simplifying the latter integral and introducing it into (C.ll), we reach 

1 ra + 

E [ dist 2 (G.Ty/n- r- d \\X(r, m, n) || Sl )] — pv + p(l - pv)(l + t 2 ) + (1 - p) (1 - pv) / {u - t) 2 • <p y (u) du. 

mn Ja-VT 



Rescaling the error estimate for (CIO), we see that the error in the normalized statistical dimension is at 
most 2l{n^mr), which converges to zero as the parameters grow. We obtain the asymptotic result 

— ff(»(||-|| Sl ,X(r,m,n)))-inf |pv + p(l - pv)(l + t 2 ) + (1 - p)(l - pv) [ + {u-t) 2 .p y (u)du 

mn r&0 ( Ja_VT 

This is the main conclusion (4.14). To obtain the stationary equation (4.16), we differentiate the brace with 
respect to t and set the derivative to zero. □ 

C.4. Permutahedra and finite reflection groups. In this section, we use a deep connection between conic 
geometry and classical combinatorics to compute the statistical dimension of the normal cone of a (signed) 
permutahedron. This computation requires the full power of Proposition 5.11, the characterization of the 
statistical dimension as the mean intrinsic volume of a cone. 

A. finite reflection group is a finite subgroup of the orthogonal group 0^ that is generated by reflections 
across hyperplanes [ST54, Ste59, CM72, BB10]. Each finite reflection group partitions U d into a set {l/Cg? : 



U e of polyhedral cones called chambers. The chambers of the infinite families Aa-i and BCd of irreducible 
finite reflection groups are isometric to the cones 

C A :={xe«. d :xi<---<x d } and C B c ■= {x£ IR d : < X\ < ••• < x d }. 

It turns out that the chambers Ca and Cbc coincide with the normal cones of certain permutahedra. 

Fact C.2 (Normal cones of permutahedra). Suppose that the vector x has distinct entries. Then the normal cone 
J{{S?{x),x) is isometric to Ca and the normal cone jY{£P±{x),x) is isometric to Cbc- 

See [HLT11, Sec. 2] for a proof of Fact C.2. 

We claim that the statistical dimensions of the chambers Ca and Cbc can be expressed as 

8{C A ) = H d and 6{C BC ) = \a d , (C.12) 

where := EjjLj i~ l is the dth harmonic number. Proposition 4.9 follows immediately when we combine this 
statement with Fact C.2. 

Let us explain how the theory of finite reflection groups allows us to deduce the expression (C.12) for 
the statistical dimension of the chambers. First, it follows from [BZ09] and the characterization [SW08, 
Eq. (6.50)] of intrinsic volumes in terms of polytope angles that 

\^k\ 

MQ,)~, 
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where M? k is the subset of consisting of matrices with a {d- k) -dimensional null space. Define the generating 
polynomial of the intrinsic volumes 

t&ls) := £ v k {C^)s k = £ \J? k \ s k . 

k=0 \- J \ k=0 

This polynomial is a well-studied object in the theory of finite reflection groups, and it has many applications 
in conic geometry as well [Amell, Sec. 4.4]. For our purposes, we only need the relationships 

<?^(1) = 1 and ^(1)= E kv k {Cy) = 8{C%). 
as fc=i 

These points follow immediately from (5.1) and Proposition 5.11. 

The roots {( k : k = l,...,d] of the polynomial q<g are called the (negative) exponents of the reflection 
group [CM72, Sec. 7.9]. Factoring the generating polynomial, we obtain a concise expression for the 
statistical dimension: 



ff(Qf) = -J^(l): 

as 



I d 



1 



Vfc=l 



d A i A \ 



1.-1 U5 t _ , I s= 1 ,,_ , 1 



fc=l 5-1 fc=l 



We can deduce the value of the large parenthesis because of the normalization q<g[\) = 1. The exponents 
associated with the groups A^-i and _BQ are collected in [CM72, Tab. 10], from which it follows immediately 
that 

8{C A ) = £ \ and 8{C B c) ^Zt- 
fc=i K 1 Jt=i * 

This completes the proof of the claim (C.12). 

Appendix D. Technical lemmas for concentration of intrinsic volumes 

This appendix contains the technical results that undergird the proof of the result on concentration of 
intrinsic volumes, Theorem 6.1, and the approximate kinematic bound, Theorem 7.1. 

D.l. Interlacing of tail functionals. First, we establish the interlacing inequality for the tail functionals. 
This result is a straightforward consequence of the Crofton formula (5.10). 

Proof of Proposition 5.6. Let L^-fc+i be a linear subspace of dimension d- k+ 1, and let L^t be a linear 
subspace of dimension d-k inside L rf _ fc+1 . The Crofton formula (5.10) shows that the half-tail functionals 
are weakly decreasing: 

2h k+1 {C) = P{CnQL d . k * {0}} < P{Cn QL d . k+1 * {0}} = 2h k {C), 

where the inequality follows from the containment of the subspaces. We can express the tail functional t k as 
the average of the half-tail functionals: 

\t k {C)=\[h k {C) + h k+1 {C)\. 
Therefore, 2h k {C) > t k {C) > 2h k+l {C). □ 

D.2. Bounds for tropic functions. We continue with the proof of Lemma 6.2, which provides a bound on 
the tropic functions. This argument is based on an approximation formula from the venerable compendium of 
Abramowitz & Stegun [AS64, Sec. 26.5.21]. 

Fact D. 1 (Approximation of beta distributions) . Let Xbe a beta(a, b) random variable. Assume that a+b>6 
and (a+ b- 1)(1 - x) > 0.8. Define the quantity y via the formula 

where Wl = m U3 and w 2 = {a{l-x)) 113 



Then 

P{X<jt} = d>(y) + £(x) with k(x)| <5-10" 3 . 
The function <t> represents the cumulative distribution of a standard normal random variable. 
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Proof of Lemma 6.2. We need to show that the tropic function I^{kld) > 0.3 for all integers k and d that satisfy 
1 < k < d - 1. (The cases k = and k = d are trivial.) To accomplish this goal, we represent the tropic function 
in terms of a beta random variable, and we apply Fact D. 1 to approximate its value. 

Let X ~ BETA(a,fo) with shape parameters a-\k and b = \{d-k). In particular, E[X] = kid. It is well 
known [Art02, Sec. 2] that the tropic function can be expressed in terms of this random variable: 

1 - 4{k/d) = P{ || U Lk (0) || 2 < kid} = P{X < E[X]}. 

We must show that this probability is bounded above by 0.7. 

First, the mean-median-mode inequality [vdVW93] implies that the mean kid of X is smaller than the 
median when k > \d, so the probability is bounded by 0.5 in this regime. 

Second, the cases where a+ b < 6 correspond with the situation where d < 12. We can enumerate the cases 
where d < 12 and < k < d/2. In each case, we verify numerically that the required probability is less than 0.7. 

It is easy to check that for a - \k, b- \{d-k), and x- kid, the inequality (a + b - 1) (1 - x) > 0.8 holds when 
< k < |rf and d > 12. Instantiating the formula from Fact D.l and simplifying, we reach 

_ y/2 d-2k V2 
y ~ 3 ' sjdk{d-k) 3 ' 
Indeed, for each d, the extremal choice is k= 1. The function <D is increasing, so we conclude that 

(V2\ 



P{X<k/d}<<3) 



+ 5-10" 3 <0.69. 



v 3 

The latter bound results from numerical computation. □ 

D.3. The projection of a spherical variable onto a cone. In this section, we establish Lemma 6.3, which 
controls the probability that a spherical random variable has an unusually large projection on a cone. Although 
the lemma is framed in terms of a spherical random variable, it is cleaner to derive the result using Gaussian 
methods. We require an exponential moment inequality [Bog98, Cor. 1.7.9] that ultimately depends on the 
Gaussian logarithmic Sobolev inequality. 

Fact D.2 (Exponential moments for a function of a Gaussian variable). Suppose that F : J8 d -> R satisfies 
EF(g) = 0. Assume moreover that F belongs to the Gaussian Sobolev class H^tyd), i.e., the squared norm of its 
gradient is integrable with respect to the Gaussian measure. Then 

Ee Wg)<[ Ee (f/4}l|VF(g)^j 2 ^H- 2 « whefl0<f< l. (D.l) 

Using this exponential moment bound, we reach an elegant estimate for the moment generating function 
of the squared projection of a standard normal vector onto a cone. 

Sublemma D.3 (Exponential moment bounds). Let Ke c ^ d be a closed convex cone. Then 

Ee ^lin^)ll 2 -WU exp j^!^j for -|<£<I. (D.2) 

Proof. The result is trivial when £ = 0, so we may limit our attention to the case where the parameter £ is 
strictly positive or strictly negative. First, suppose that £ > 0. Consider the zero-mean function 

Fig) = \\n K {g)\\ 2 -8(K) with ||VF(g)|| 2 =4||n Jf (g)|| 2 . (D.3) 

The gradient calculation follows from (3.10), and it is easy to see that Fe H 1 ^) because the projection onto 
a cone is a contraction. The exponential moment bound (D.l) delivers the estimate 



E JW £ (^1,™,^' = exp (?^) • [Ee^f 



The second relation follows when we add and subtract £8(K) in the exponential function. We have compared 
the moment generating function of F(g) with itself. Solving the relation, we obtain the inequality 

Ee^<ex P (^^) for0<£<i. 
This is the bound (D.2) for the positive range of parameters. 



42 D. AMELUNXEN, M. LOTZ, M. B. MCCOY, AND J. A. TROPP 

Now, we turn to the negative range of parameters, which requires a more convoluted argument. To make 
the analysis clearer, we continue to assume that f > 0, and we write the negation explicitly. Replacing F by 
-F, the exponential moment bound (D.l) yields 

Ee -f TO = Ee^™> < (Ee^c*)!! 2 ) 2 ™ 1-20 . (D.4) 

This time, we cannot identify a copy of the left-hand side on the right-hand side. Instead, let us run the 
moment comparison argument directly on the remaining expectation: 

Ee a\n K ( g )\\ 2 _ e (6(K) . Ee (F(g) < e tS{K) l Ee H\n K {g)\\ 2 ^ ni - 2 ° 



The last inequality follows from the exponential moment bound (D.l), just as before. Solving this relation, 
we obtain 

Ee^' n ^" 2 <exp( m ;^ m ) for0<f<i. 
Introduce the latter inequality into (D.4) to reach 

Ee-™ S e*p(?^f). 

This estimate addresses the remaining part of the parameter range in (D.2). □ 

With this result at hand, we can easily prove the tail bound for the projection of a spherical random variable 
onto a cone. 

Proof of Lemma 6.3. For a parameter £ > 0, the Laplace transform method [Bar05, Sec. 2] delivers 

P{d \\n c {0)\\ 2 > 8(C) + A} < e -^ 5(C) .Ee^ l|nc(9)ll \ 

Let R be a chi random variable with d degrees of freedom, independent from 0. Using Jensen's inequality, we 
can bound the expectation: 

Ee fd||n c (0)ll 2 _ Ee e(EB 2 )lin c (9)ll 2 < Ee f lin c CR0)ll 2 _ Ee £lin c (g)ll 2 
Combining these results, we obtain 

P{d ||n c (0)l| 2 > 5(C) + A} < e" a • Ee ( d[n c (g)l[ 2 -<5(C))_ (D _ 5) 
Substitute the inequality for the moment generating function (D.2) with K-C into (D.5) to reach 

P{d ||n c (g)l| 2 > 8(C) + A} < e~ (X ■ exp ( 2 f_^ ) for 0<f<i 

Select ^ = A/ (4 8(C) + 4A) to complete the first half of the argument. 

The second half of the proof results in an analogous bound with C replaced by C°. Note that 

P>{d ||n c (0)|| 2 > 8(C) + A} = P{d ( ||n c (0)|| 2 - 1) + (d-8(C)) > A} = P{8(C°) - d ||n c ° (0)|| 2 > A}. 

The second relation follows from the Pythagorean identity (3.8) and the totality law (4.4). Repeating the 
Laplace transform argument from above, with £ > 0, we obtain the inequality 

P{d \\n c (6) || 2 > 8(C) + A} < e~ (X ■ Ee" f WW Z -8(C°)) _ (D 6) 

Introduce the bound (D.2) with K = C° into (D.6) to see that 

P{d||n c (6»|| 2 >g(C) + A}<e- fA -exp^ ^— ' j for 0<^<i. 

Choose ^ = A/(45(C°) + 4A) to complete the second half of the argument. □ 
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D.4. Tail functionals of a product. Finally, we argue that the tail functional of a product cone is controlled 
by the tail functionals of the two summands. This result is very simple, and it can surely be strengthened by 
taking advantage of the structure of the product. 

Proof of Lemma 7.2. Let CKe^^ be closed convex cones. Define independent random variables X and Y 
that take values in {0,1,..., d) and whose distributions are given by the intrinsic volumes of the cones C and K, 
respectively. That is, 

P{X= fc}= v k (C) and P{Y = k}=v k (K) for ifc= 0,1,2,..., d. 

According to the rule (5.4) for the intrinsic volumes of a product cone, 

P{X + Y = fc}= v k {CxK) for k = 0,1,2,. ..,2d. 

By dint of this identity, we can use probabilistic reasoning to bound the tail functionals of the cone v k (C x K). 
Indeed, observe that 

P{X +Y> 5(C) + 8(K) + 2A] < P{X > 5(C) + A} + P{Y > 8(K) + X}. 
We can rewrite this inequality in terms of tail functionals: 

t\8(C)+8(K)+2X] (C X K) < t[5(C)+Al (Q + ff5(JQ+Al TO- 

This is the advertised conclusion. □ 

Appendix E. The relationship between statistical dimension and Gaussian width 

This appendix contains a short proof of Proposition 10.1, which states that the Gaussian width of a spherical 
convex set is comparable with the statistical dimension of the cone generated by the set. 

Proof of Proposition 10.1. Let Ce ^ be a closed convex cone. It is easy to check that the statistical dimension 
of C dominates its Gaussian width: 

w 2 (C):= [Esu Pjc£CnSd <g, x)] 2 <[Esup x£CnBd (g, x)] 2 = [E dist(g, C°)] 2 < E [dist 2 (g, C°)] = 5(C). 

The first inequality holds because we have enlarged the range of the supremum, and the second relation 
depends on a standard duality argument [CRPW12, Eqns. (30), (33)]. Afterward, we invoke Jensen's 
inequality, and we recognize the polar form (4.4) of the statistical dimension. 

For the reverse inequality, define the random variable Z :- Z(g) := sup XECnS d-i (g, x), and note that 
w{C) = EZ. The function g>— ■ Z(g) is 1-Lipschitz because the supremum occurs over a subset of the Euclidean 
unit sphere. Therefore, we can bound the fluctuation of Z as follows. 

E [Z 2 ] - w 2 {C) = E[(Z- EZ) 2 ] = Var(Z) < 1. (E.l) 

The last inequality follows from Fact B.3 

As a consequence of (E.l), we obtain the required bound 8(C) < w 2 {C) + 1 as soon as we verify that 
5(C) < E[Z 2 ]. Since Z 2 is a nonnegative random variable, 

E[^ 2 ] a [^ 2 -V\c»^)]=E[(sup JC£CnSd - 1 (g, *» 2 - Wte)]> 

where denotes the indicator of the event E. We claim that the right-hand side of this inequality equals 
the statistical dimension 8(C). Indeed, for any u t C°, a homogeneity argument delivers sup.,. eCnS d-i (it, jc> = 
sup xeCnB <«-i (u, x). On the other hand, when ue C°, we have the relation sup XECnB d-i (u, x) = 0. Combine 
these observations to reach 

E[Z 2 ] >E[(sup JC£CnSrf - 1 (g, *>) 2 - WW] =E[(sup ieCnBd - 1 (g, *>) 2 ] = E[dist 2 (g,C°)]. 

The last relation follows from the same duality argument mentioned above. On account of (4.4), we identify 
the right-hand side as the statistical dimension 8(C). □ 



44 



D. AMELUNXEN, M. LOTZ, M. B. MCCOY, AND J. A. TROPP 



Acknowledgments 

DA is with the School of Operations Research and Information Engineering, Cornell University. Research 
supported by DFG grant AM 386/1-1. 

ML is with the School of Mathematics, The University of Manchester. Research supported by Leverhulme 
Trust grant R41617 and a Seggie Brown Fellowship of the University of Edinburgh. 

MBM and JAT are with the Department of Computing and Mathematical Sciences, California Institute of 
Technology. Research supported by ONR awards N00014-08-1-0883 and N00014-11-1002, AFOSR award 
FA9550-09-1-0643, and a Sloan Research Fellowship. 

The authors wish to thank Babak Hassibi and Samet Oymak for helpful discussions on the connection 
between phase transitions and minimax risk. Jared Tanner provided detailed information about contemporary 
research on phase transitions for random linear inverse problems. Venkat Chandrasekaran offered advice on 
some aspects of convex geometry. 



References 

[ABll] D. Amelunxen and P. Burgisser. Probabilistic analysis of the Grassmann condition number. Available at arxiv. org/abs/1112. 
2603, 2011. 

[AB12] D. Amelunxen and P. Burgisser. Intrinsic volumes of symmetric cones. Available at arxiv . org/abs/1205 . 1863, 2012. 
[AF03] M. J. Ablowitz and A. S. Fokas. Complex variables: Introduction and applications. Cambridge Texts in Applied Mathematics. 

Cambridge University Press, Cambridge, 2nd edition, 2003. 
[AG03] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., 95(1, Ser. B):3-51, 2003. ISMP 2000, Part 3 

(Atlanta, GA). 

[A1148] C. B. Allendoerfer. Steiner's formulae on a general S" +1 . Bull. Amer. Math. Soc, 54:128-135, 1948. 

[Amell] D. Amelunxen. Geometric analysis of the condition of the convex feasibility problem. PhD Thesis, Univ. Paderborn, 2011. 

[And84] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley Series in Probability and Mathematical Statistics: 

Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 2nd edition, 1984. 
[Art02] S. Artstein. Proportional concentration phenomena on the sphere. Israel J. Math., 132:337-358, 2002. 
[AS64] M. Abramowitz and I. A. Stegun. Handbook of mathematical functions with formulas, graphs, and mathematical tables, 

volume 55 of National Bureau of Standards Applied Mathematics Series. For sale by the Superintendent of Documents, U.S. 

Government Printing Office, Washington, D.C., 1964. 
[AS92] F. Affentranger and R. Schneider. Random projections of regular simplices. Discrete Comput. Geom., 7(3):219-226, 1992. 
[Bar05] A. Barvinok. Math 710: Measure concentration. Lecture notes. Available at www.math.lsa.umich.edu/~barvinok/ 

total710.pdf, 2005. 

[BB10] A. V. Borovik and A. Borovik. Mirrors and reflections. Universitext. Springer, New York, 2010. The geometry of finite reflection 
groups. 

[BH99] K. Boroczky, Jr. and M. Henk. Random projections of regular polytopes. Arch. Math. (Basel), 73(6):465-473, December 1999. 
[BLM12] M. Bayati, M. Lelarge, and A. Montanari. Universality in polytope phase transitions and message passing algorithms. Available 

atarxiv.org/abs/1207.7321, 2012. 
[BM12] M. Bayati and A. Montanari. The LASSO risk for Gaussian matrices. IEEE Trans. Inform. Theory, 58(4):1997-2017, 2012. 
[BMS06] J. Bobin, Y. Moudden, and J.-L. Starck. Morphological diversity and source separation. IEEE Trans. Signal Process., 13(7):409- 

412, 2006. 

[Bog98] V. I. Bogachev. Gaussian measures, volume 62 of Mathematical Surveys and Monographs. American Mathematical Society, 
Providence, RI, 1998. 

[BS10] Z. Bai and J. W. Silverstein. Spectral analysis of large dimensional random matrices. Springer Series in Statistics. Springer, 
New York, 2nd edition, 2010. 

[BTN01] A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series on Optimization. Society for 

Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. Analysis, algorithms, and engineering applications. 
[BV04] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004. 

[BZ09] P. V. Bibikov and V. S. Zhgoon. Angle measures of some cones associated with finite reflection groups. J. Lie Theory, 
19(4):767-769, 2009. 

[CDS01] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Rev., 43(1):129-159, 2001. 

Reprinted from SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61 (electronic). 
[CM72] H. S. M. Coxeter and W. O. J. Moser. Generators and relations for discrete groups. Springer-Verlag, New York, 3rd edition, 

1972. Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 14. 
[CM73] J. F. Claerbout and F. Muir. Robust modeling of erratic data. Geophysics, 38(5):826-844, October 1973. 
[CRPW12] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Found. Comput. 

Math., 12(6):805-849, 2012. 

[CSPW1 1] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM J. 
Optim, 21(2):572-596, 2011. 

[CT06] E. J. Candes and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. 
Inform. Theory, 52(12) :5406-5425, 2006. 



THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 



45 



[DGM13] D. L. Donoho, M. Gavish, and A. Montanari. The phase transition of matrix recovery from Gaussian measurements matches 

the minimax MSE of matrix denoising. Available at arxiv . org/abs/1302 . 2331, 2013. 
[DJM11] D. L. Donoho, I. Johnstone, and A. Montanari. Accurate prediction of phase transitions in compressed sensing via a connection 

to minimax denoising. Available at arxiv. org/abs/1111 . 1041, 2011. 
[DMM09a] D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proc. Nat. Acad. Sci. U.S.A., 

106(45):18914-18919, 2009. 

[DMM09b] D. L. Donoho, A. Maleki, and A. Montanari. Supporting information to: Message-passing algorithms for compressed sensing. 

Proc. Nat. Acad. Sci. U.S.A., 2009. 
[Don06a] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289-1306, 2006. 

[Don06b] D. L. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete 

Comput. Geom., 35(4):617-652, 2006. 
[DT05] D. L. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA, 

102(27) :9452-9457 (electronic), 2005. 
[DT09a] D. L. Donoho and J. Tanner. Counting faces of randomly projected polytopes when the projection radically lowers dimension. 

J. Amer. Math. Soc, 22(l):l-53, 2009. 
[DT09b] D. L. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for 

modern data analysis and signal processing. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci, 367(1906) :4273-4293, 

2009. 

[DTlOa] D. L. Donoho and J. Tanner. Counting the faces of randomly-projected hypercubes and orthants, with applications. Discrete 

Comput. Geom., 43(3):522-541, 2010. 
[DTlOb] D. L. Donoho and J. Tanner. Exponential bounds implying construction of compressed sensing matrices, error-correcting 

codes, and neighborly polytopes by random sampling. IEEE Trans. Inform. Theory, 56(4):2002-2016, 2010. 
[DZ10] A. Dembo and O. Zeitouni. Large deviations techniques and applications, volume 38 of Stochastic Modelling and Applied 

Probability. Springer-Verlag, Berlin, 2010. Corrected reprint of the second (1998) edition. 
[ESQD05] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho. Simultaneous cartoon and texture image inpainting using morphological 

component analysis (MCA) . Appl. Comput. Harmon. Anal. , 19(3):340-358, 2005. 
[Faz02] M. Fazel. Matrix rank minimization with applications. Phd thesis, Stanford University, 2002. 

[GB13] M. C. Grant and S. P. Boyd. The CVX user's guide, release 2.0 (beta). User manual. Available at cvxr . com/evx/doc, 2013. 
[GHS02] F. Gao, D. Hug, and R. Schneider. Intrinsic volumes and polar sets in spherical space. Math. Notae, 41:159-176 (2003), 

2001/02. Homage to Luis Santalo. Vol. 1 (Spanish). 
[Gla95] S. Glasauer. Integralgeometrie konvexer Korper im spharischen Raum. PhD Thesis, Univ. Freiburg i. Br., 1995. 
[Gla96] S. Glasauer. Integral geometry of spherically convex bodies. Diss. Summ. Math., l(l-2):219-226, 1996. 
[Gor85] Y. Gordon. Some inequalities for Gaussian processes and applications. Israel J. Math., 50(4):265-289, 1985. 
[Gor88] Y. Gordon. On Milman's inequality and random subspaces which escape through a mesh in R". In Geometric aspects of 

functional analysis (1986/87), volume 1317 of Lecture Notes in Math., pages 84-106. Springer, Berlin, 1988. 
[Her43] G. Herglotz. Uber die Steinersche Formel fur Parallelflachen. Abh. Math. Sem. Hansischen Univ., 15:165-177, 1943. 
[HJ90] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, 1990. Corrected reprint of the 1985 

original. 

[HLT11] C. Hohlweg, C. E. M. C. Lange, and H. Thomas. Permutahedra and generalized associahedra. Adv. Math., 226(1) :608-640, 
2011. 

[HUL93a] J.-B. Hiriart-Urruty and C. Lemarechal. Convex analysis and minimization algorithms. I, volume 305 of Grundlehren der Mathe- 
matischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993. Fundamentals. 

[HUL93b] J.-B. Hiriart-Urruty and C. Lemarechal. Convex analysis and minimization algorithms. II, volume 306 of Grundlehren der 
Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993. Advanced 
theory and bundle methods. 

[KR97] D. A. Klain and G.-C. Rota. Introduction to geometric probability. Lezioni Lincee. [Lincei Lectures] . Cambridge University 
Press, Cambridge, 1997. 

[KXAH11] A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi. Analyzing weighted (\ minimization for sparse recovery with 

nonuniform sparse models. IEEE Trans. Signal Process., 59(5):1985-2001, 2011. 
[LedOl] M. Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical Surveys and Monographs. American 

Mathematical Society, Providence, RI, 2001. 
[McM93] P. McMullen. Valuations and dissections. In Handbook of convex geometry, Vol. A, B, pages 933-988. North-Holland, 

Amsterdam, 1993. 

[Mez07] F. Mezzadri. How to generate random matrices from the classical compact groups. Notices Amer. Math. Soc, 54(5):592-604, 
2007. 

[MP67] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.), 72 
(114):507-536, 1967. 

[MP97] M. Mesbahi and G. P. Papavassilopoulos. On the rank minimization problem over a positive semidefinite linear matrix 

inequality. IEEE Trans. Automat. Control, 42(2):239-243, 1997. 
[MT12] M. B. McCoy and J. A. Tropp. Sharp recovery bounds for convex deconvolution, with applications. Available at arxiv . org/ 

abs/1205.1580, 2012. 

[OHIO] S. Oymak and B. Hassibi. New null space results and recovery thresholds for matrix rank minimization. Available at 

arxiv.org/abs/1011.6326, 2010. 
[OH12] S. Oymak and B. Hassibi. On the equivalence of minimax risk and the phase transitions of compressed signal recovery. In 

preparation., 2012. 



46 



D. AMELUNXEN, M. LOTZ, M. B. MCCOY, AND J. A. TROPP 



[OKH11] S. Oymak, M. A. Khajehnejad, and B. Hassibi. Improved thresholds for rank minimization. In 2011 IEEE International 

Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5988-5991, 2011. 
[OLBC10] F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors. NIST handbook of mathematical functions . U.S. Department 

of Commerce, National Institute of Standards and Technology, Washington, DC, 2010. 
[RFP10] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm 

minimization. SIAMRev., 52(3):471-501, 2010. 
[Roc70] R. T. Rockafellar. Convex analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J., 1970. 
[RV08] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussian measurements. Comm. PureAppl. Math., 

61(8):1025-1045, 2008. 

[RW98] R. T. Rockafellar and R. J.-B. Wets. Variational analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften 

[Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1998. 
[RXH11] B. Recht, W. Xu, and B. Hassibi. Null space conditions and thresholds for rank minimization. Math. Program., 127(1, Ser. 

B): 175-202, 2011. 

[San50] L. A. Santalo. On parallel hypersurfaces in the elliptic and hyperbolic fj-dimensional space. Proc. Amer. Math. Soc, 1:325-330, 
1950. 

[San76] L. A. Santalo. Integral geometry and geometric probability. Addison-Wesley Publishing Co., Reading, Mass. -London-Amsterdam, 
1976. 

[Sch50a] L. Schlafli. Gesammelte mathematische Abhandlungen. Band I. Verlag Birkhauser, Basel, 1950. 

[Sch50b] L. Schlafli. Theorie der Vielfachen Kontinuitat (1852), Gesammelte mathematische Abhandlungen. Band I. Verlag Birkhauser, 
Basel, 1950. 

[Sch93] R. Schneider. Convex bodies: the Brunn-Minkowski theory, volume 44 of Encyclopedia of Mathematics and its Applications. 

Cambridge University Press, Cambridge, 1993. 
[SDC03] J.-L. Starck, D. L. Donoho, and E. J. Candes. Astronomical image representation by the curvelet transform. Asfronom. 

Astrophys., 398(2):785-800, 2003. 
[SED05] J.-L. Starck, M. Elad, and D. L. Donoho. Image decomposition via the combination of sparse representations and a variational 

approach. IEEE Trans. Image Process., 14(10):1570-1582, October 2005. 
[SS86] F. Santosa and W. W. Symes. Linear inversion of band-limited reflection seismograms. SIAMJ. Set Statist. Comput, 7(4):1307- 

1330, 1986. 

[ST54] G. C. Shephard and J. A. Todd. Finite unitary reflection groups. Canadian J. Math., 6:274-304, 1954. 
[Ste59] R. Steinberg. Finite reflection groups. Trans. Amer. Math. Soc, 91:493-504, 1959. 

[Sto09] M. Stojnic. Various thresholds for t\ -optimization in compressed sensing. Available at arxiv . org/abs/0907 . 3666, 2009. 
[SW08] R. Schneider and W. Weil. Stochastic and Integral Geometty. Springer series in statistics: Probability and its applications. 
Springer, 2008. 

[vdVW93] R. van de Ven and N. C. Weber. Bounds for the median of the negative binomial distribution. Metrika, 40(3-4): 185-189, 
1993. 

[VS86] A. M. Vershik and P. V. Sporyshev. An asymptotic estimate for the average number of steps in the parametric simplex method. 

USSR Comput. Maths. Math. Phys., 26(3):104-113, 1986. 
[VS92] A. M. Vershik and P. V. Sporyshev. Asymptotic behavior of the number of faces of random polyhedra and the neighborliness 

problem. Selecta Math. Soviet, 11(2):181-201, 1992. Selected translations. 
[Wat92] G. A. Watson. Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. , 170:33-45, 1992. 
[XH11] W. Xu and B. Hassibi. Precise stability phase transitions for C\ minimization: A unified geometric framework. IEEE Trans. 

Inform. Theory, 57(10):6894-6919, October 2011. 



